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INTRODUCTION 


Tus DISCUSSION of psychological tests involves a number of important 
problems. One of these is the highly complicated nature of the abilities 
which psychological tests measure. The nature of general intelligence and 
of special abilities and disabilites is presented as an introduction to this 
number of the Review. There follows the nature and extent of individual 
differences in various groups and populations in both general and special 
intelligence. A brief presentation of test construction and statistical technic 
is then offered to show how tests are molded and affected by factors other 
than the traits which they are designed to measure. Problems of test construc- 
tion and the theoretical phases of psychological tests are sufficiently compli- 
cated and technical to engross the entire attention and interest of psycholo- 
gists, but in this review they represent only one phase of the entire problem. 
The discussion continues with a survey of the application of test results to 
normal and atypical groups and closes with a brief consideration of voca- 
tional aptitudes tests. 

The value of psychological tests depends largely upon their proper appli- 
cation to various populations and upon the modifications in educational and 
social procedures that result from their use. Testing programs determine 
the need for changes which psychologists are not always equipped to make, 
nor are their interests always pointed in that direction. On the other hand, 
administrators and teachers are sometimes conscious of needed changes, but 
either feel that this is not their responsibility, or that the psychologists 
should complete the changes which seem necessary. Lack of progress is the 
obvious result. The committee feels that along with further improvements 
of testing technic and test construction must come a better understanding 
and mutual interest in the revision of educational procedures. 

The committee has found difficulty in selecting a limited number of 
references from a very large field. In some cases where several investigators 
have reported on similar investigations, the authors have been listed in the 
text without bibliographic references. 

Harry J. BAKER, Chairman, 
Committee on Character Tests and Psychological Tests. 











CHAPTER I 


General Intelligence and Its Measurement 


Generar InTELLIGENCE and methods used in its measurement are presented 
in this chapter. The historical background of mental measurement is fol- 
lowed by a discussion of the nature of general intelligence. A résumé of 
selected references dealing with current theory and practice in intelligence 
testing then leads to a brief presentation of verbal and non-verbal tests. A 
note on test terminology closes the discussion. 


History of Mental Measurement 


The history of mental measurement as a field of specialized interest in 
psychology has been treated in a number of textbooks concerned with prac- 
tical discussions of individual differences in mental ability. In spite of 
differences in matters of detail, there is a certain uniformity in the selection 
of beginnings, in emphasis, and interpretations. Examples of good discus- 
sions of the history of intelligence testing appear in books by Bisch (8), 
Freeman (34), Peterson (75), and Young (109). Pintner (78) issues a 
yearly review of current literature in the Psychological Bulletin. 

In general, the modern development of tests is traced back to work pub- 
lished in the last two decades of the nineteenth century: by Cattell, on the 
measurement of individual differences in simple sensory and motor proc- 
esses; by Ebbinghaus, on the completion test as a measure of intellectual 
capacity; by Galton, on research in hereditary factors in ability; by Gilbert, 
on the validity of measures of general intelligence determined by comparing 
test results with estimates of ability; by Miinsterberg, on logical analyses 
of abilities, but without statistical treatment; by Oehrn, on a system which 
would graph an individual’s ability in a profile of scores on tests whose 
values were determined by a simple correlation technic; and by other 
pioneer workers such as Wissler, Jastrow, Bolton, and Binet. 

In the early years of his experimental work Binet followed the tradi- 
tional line of approach with its theoretical analysis of intelligence. How- 
ever, when he was assigned the task of selecting candidates for special 
subnormal schools in Paris, he met a practical problem that gave new 
direction to his work. To solve this problem he began to combine tests of 
many types into a single scale. The very hodgepodge of single tasks which 
he put together favored him in getting a fairly good measure of general 
ability. His success diverted attention from the measurement of specific 
abilities and led to a concentration of interest on the part of many psycholo- 
gists on the new problem, the measurement of general mental ability. As a 
result, Binet’s technics have been greatly refined. In addition to his scales 
of 1905, 1908, and 1911, Binet made two outstanding contributions to the 
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theory of psychological measurement. He developed, first, the concept of 
general intelligence; and second, the concept of mental age. The use of the 
concept, intelligence quotient, to denote relative rates of development, 
formulated by Stern and popularized by Terman, gave new clearness to the 
concept of mental measurement for many persons. Binet’s influence is re- 
flected in the many revisions of the Binet-Simon Test which have appeared; 
for example, that by Bobertag in Germany, 1913; Decroly and Degand in 
Belgium, 1910; Johnson in England, 1911; and American revisions by 
Goddard, 1911; Herring, 1922; Kuhlman, 1912 and 1922; Terman, 1913 
and 1916; and Wallin, 1911. 

The movement for individual measurement of general ability fused with 
the development of tests for measuring single traits. Special tests, on ac- 
count of their limitations in general interpretations, were overshadowed. 
With the development of new series of tests other advances were achieved. 
Statistical technics were developed to the point where every test could be 
required to meet definite standards with respect to validity and reliability. 
With the growth of emphasis on the correlation method as a technic for the 
selection of valid and supplementary tests, an individual point scale was 
developed by Yerkes in 1915. This was soon followed by many group point 
scales of intelligence. 


Nature of General Intelligence 


The nature of general intelligence has been studied by many workers. 
A summary of early studies in this field is given in the Twenty-first Year- 
book of the National Society for the Study of Education, 1922 (69). 

The psychologists, particularly the physiological psychologists, have 
been very active in this field. Child (17), Herrick (45), and Lashley (59) 
developed certain common general principles which are assumed to govern, 
not only gross physiological growth, but also intelligence. Lashley’s work 
called in question an hypothesis that had been widely accepted, namely, 
that reflexes are isolated conduction paths. He found that cortical functions 
utilized in learning to run mazes were not dependent on the existence of 
specific neural patterns in the cortex. Experimental destruction of cortical 
areas in rats did not show any specific area which must be kept intact in 
order to develop or preserve a particular mental function. He inferred that 
(59: 176) “the mechanisms of integration are to be sought in the dynamic 
relations among the parts of the nervous system rather than in details of 
structural differentiation.” His work pointed toward a theory of the nature 
of these forces. Its bearing on the nature of intelligence may be summarized 
in his own language (59: 173-74) : 


[These experiments] lend support to the theory which conceives intelligence as a 
general capacity, in the same measure that they oppose theories of restricted reflex 
conduction. The capacity to form and to retain a variety of maze habits and other less 
well-defined habits seems relatively constant for each individual, dependent upon the 
absolute quantity of cortical tissue functional and independent of any qualitative 
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differentiation of the cortex or sensori-motor peculiarities of the problems solved. There 
is an indication that difficult tasks become disproportionately more difficult with de- 
creased cerebral efficiency. Such facts can only be interpreted as indicating the exist- 
ence of some dynamic function of the cortex which is not differentiated in respect to 
single capacities but is generally effective for a number to which identical neural 
elements cannot be ascribed. In this there is close harmony with theories of a general 
factor determining efficiency in a variety of activities. The diverse results obtained in 
the studies of problem boxes and brightness discrimination show that this factor is not 


universally effective. 

Discrimination of amounts of learning ability among individuals led to 
consideration of the cause of these differences. Educational psychologists 
became vitally interested in a practical interpretation of differences, their 
causes and their educational possibilities. An essential problem was the 
prediction of certain traits through the measurement of other traits. Psy- 
chologists soon discovered that simple acts or an individual’s fund of com- 
mon information could be measured with comparative ease. Such measure- 
ments had circumscribed value, however, until it was demonstrated that 
there was a dependable relation between functioning in the measured trait 
and functioning in other traits and at other levels of integration. 

Thorndike (99) studied the relationship between the amount of work 
which an individual could do on a single level (area) , the difficulty of tasks 
which he could perform (altitude or level) , and the number of units of work 
which an individual could produce in a given time (speed). He set up new 
standards for intelligence, since he found that in tests of known level, with 
performance of measured area and speed, he could reconstruct the intelli- 
gence of each subject, other things being equal. Furthermore, Thorndike 
found a high positive correlation for these three aspects of intelligence as 
defined and measured. With regard to intelligence in its larger aspects, “the 
ability to deal with things or persons or ideas by the use of ideas,” he pro- 
posed the hypothesis that all intellectual operation is identical with the 
processes of association—that higher operations simply require more asso- 
ciations. Although admitting the possibility of a single cause of individual 
differences in intelligence such as vitality—as some had proposed—Thorn- 
dike preferred the hypothesis that the degree of intelligence is increased by 
each increase in the number of connections between ideas. 

Thorndike’s work finds support in that of Tilton (101). Further analyses 
of intelligence in terms of speed of activity were made by Peak and 
Boring (74). 

Spearman (88) analyzed intelligence, making large use of statistical 
methods. He stated that to regard the average score of the individual on a 
series of trait tests as representative of the average of the person’s abilities 
is to go beyond knowledge and sensible assumptions, since to do so assumes 
that each test measures exactly the amount and importance of each trait in 
the individual’s make-up. Since no sound analysis of mental organization 
into comparable units existed, Spearman sought to determine the contribu- 
tion of specific traits to general intelligence by statistical study of the 
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results of tests when varying amounts of specific traits were represented 
in the composite tests. His method of analysis, the tetrad difference method, 
made use of the intercorrelation of various traits. Spearman found that 
when correlations between four tests are calculated by pairs so that each 
test of the first pair is correlated with each test of the second pair, a signifi- 
cant relation often exists so that the product of any two correlations is equal 
to the product of the other two correlations. After subjecting this statistical 
procedure to searching mathematical analysis, Spearman interpreted the 
observed relationship to mean that a common factor, which he designated 
as “G,” is measured in each trait. The amount of influence which “G” exerts 
on each of several traits was calculated by Spearman and others. On these 
findings he erected a hierarchy of traits from those which are little in- 
fluenced by “G” to those which are heavily weighted by it. While “G” may 
be called intelligence, Spearman did not define it except to indicate that 
energy in the form of a common factor does exist. In his later analyses, he 
postulated four general factors in performance: “G”, mental inertia, oscilla- 
tions of mental efficiency, and self-control. Of these, only “G” manifests 
appreciable individual differences. He also studied “S” factors, the specific 
elements in intelligence. When a knowledge of “G” and “S” factors makes 
it possible to construct tests which will measure “G” and “S” at the maxi- 
mum, an accurate measure of “G” for any individual can be obtained. This, 
Spearman maintained, was far better than an average derived from a miscel- 
lany of unrelated traits. 

Thompson (96) and Thomson (97, 98) evaluated the tetrad difference 
criterion. The latter stated that it did not necessarily prove the existence of 
two factors, since a theory of multiple factors would explain it as well. 

Courtis (22) studied the nature of intelligence giving particular atten- 
tion to the problem of rate of mental development. Three natural factors 
affect the growth of intelligence: (1) the initial development of the organ- 
ism, or the starting point, which Courtis identified as the “S” in Spearman’s 
theory; (2) the maximum to which the individual grows, also specific and 
the effect of heredity; and (3) the quality of the organism, which determines 
the rate of growth in a given environment, identified as Spearman’s “G” 
factor. Because of the interaction of all these in learning, rate of growth 
can be measured only by the use of growth units which enable one to deter- 
mine the effects of the three factors separately. Courtis sought more appro- 
priate units for growth measurement than those already in use, and after 
making a series of measurements, developed certain hypotheses concerning 
mental growth. He found that growth proceeded at a uniform rate under 
uniform conditions when measured in the units which he had employed. 

For many years Dodge (28) carried on intensive laboratory work on the 
problem of human variability. He concluded that before each response to 
a situation there was a combining process in the nervous system, so that 
successive responses to the same overt stimulus varied according to the con- 
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ditions within the nervous system. This he found to be true at both lower and 













ani 
higher levels of mental integration. He summarized the function of general su] 
intelligence in this combining process as follows (28: 107) : to 
’ ; ; ee be 
Our picture of human adjustment is not a mosaic of reinforcing or conflicting f 
reflexes, instincts, habits, and voluntary acts, or a succession of discreet responses - 
under these various categories, but a dynamic continuum, a sort of spiral process with be 
a relatively simple front of overt reaction at any given moment and a highly complex 
background. Adequate experimental analysis would probably show that each overt re 
reaction is really a complex of approximate beginning reactions and elaborated adjust- M 
ments. The beginning reactions are evoked by current stimuli superposed on the 
remains of consummated responses to past stimuli by which they are inhibited, ° 
reinforced, or qualitatively modified. i 
a 
This statement proposed, not only a basis for the nature of general intelli- 
gence, but also gave an explanation of the cause of variability in response, 
a problem which had been troublesome in the building of a theory of a 
single factor in general intelligence. In further elaboration Dodge said 





(28: 134): 


Persistent cortical systematizations are discoverable in all perceptual and thinking 
processes, but they are not constants. On the contrary, they are modified more or less 
by every related experience and behavior, and subject in successive instances of their 
arousal to all the modifying influences of refractoriness, inhibition, reinforcement, 
relative fatigue, and re-systematization. In a unique way they appear at the very 
heart of our consciousness and behavior. 





— ae ee ll 













Peterson reviewed Spearman’s and Thorndike’s analyses of intelligence 
and concluded that (75: 273) : 





There can be little doubt that a somewhat median position between the extreme 
early views of Spearman, on the one hand, and of Thorndike, on the other, is nearer 
the truth. Neither view seems to be supported by a consideration of the neuromuscular 
bases of behavior. At any rate, that intelligence is associated with biological structures 
which are complete in development at a comparatively early age—at about sixteen 
years of age—may now be regarded as established. 










Kohs reached still another conclusion concerning the nature of general 
intelligence. He said (56: 10) : 


Differences in level of mental ability may some day be explained, among other 
things, on the basis of differences in fundamental synthesizing ability, or the capacity 
of the nervous system to fuse elementary states of consciousness into higher thought 
forms. 










Kelley (53) adopted Spearman’s method of determining the number of 
common factors, namely, the statistical analysis of test results. Kelley de- 
rived criteria for determining the number of common factors present. He 
argued that these common factors may vary in number and also in effective- 
ness on test scores and that their influence can be measured. A few of the 
common factors which he assumed to exist are verbal ability, memory, and 
interest. He also found that certain group factors were at work. Kelley’s 
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analysis gave a new meaning to the term general intelligence. Meili (66) 
supported Kelley’s findings with his own data, then used Gestalt concepts 
to explain results. A general and theoretical discussion of intelligence has 
been given by Richet (81). Claremont (18) found, through an analysis 
of intelligence as the ability to perceive causal relations, that not only man 
but animals may be intelligent. 

Hull (49) and Thomson (97, 98) reached similar conclusions with 
regard to the existence and nature of group factors in human behavior. The 
Minnesota investigation of mechanical aptitude (73) indicated an absence 
of any general factor, although it seemed to indicate the presence of certain 
group factors. Four such group factors are felt to be unique: mechanical 
ability, intelligence, agility, and height. 

Data have been reported and analyzed in numerous other writings of 
which the following are representative. Boge (10) analyzed practical in- 
telligence into analysis, synthesis, and functional reactions. His methods of 
measurement did not isolate these parts, but he tested them in simply organ- 
ized situations. Cornell (21) studied the practical effect of trait differences 
in educational situations. Dodd (26, 27) considered similarities and dis- 
tinctions in the theories of Spearman and Thomson. Freeman (35) showed 
by experimental methods that “improved environmental conditions result 
in a significant improvement in intelligence (as measured by the tests) .” 
Holzinger (47) supported the contentions of Spearman, using Thorndike’s 
measurements of intelligence for the data of his statistical interpretations. 
Hull (50) found that an individual’s variability with respect to thirty-five 
different traits was distributed approximately according to the normal curve. 
Kelley (54) analyzed searchingly the theoretical nature of “G”. Spearman 
(89, 90, 91) presented his conclusions and pointed out the divergence of 
his theory from those held by other investigators. 

Laycock (60) utilized Spearman’s analysis of ability and built a series 
of measurements of adaptability to new situations to comply with these 
specifications. Slocombe (86, 87) endeavored to apply the concept of “G” 
to test data to find a measure of the accuracy and value of the tests, both for 
the measurement of “G” and for an explanation of variations in test data. 
Strasheim (94) constructed a scale of tests in keeping with Spearman’s 
analysis of factors in intelligence. 

Other questions with regard to the relation between existing measure- 
ments and the true nature of intelligence were studied by Cannon (15), 
Commins (19), Pintner (76), and Wilson (108). 


Intelligence Testing 


Freeman (34) discussed the group and individual tests which were 
available in 1926. Pintner (77), in a similar comprehensive textbook, 
presented both theoretical and practical considerations with respect to 
intelligence testing. Hildreth (46) compared tests as to their practical 
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usefulness in a school’s testing program. Her manual is valuable for 
school executives and specialists. Dougherty (29) issued a monograph 
dealing critically with group tests in comparison with each other and with 
the Stanford Revision of the Binet Test. In it, tests are ranked on the basis 
of their agreement with each other. Raw scores were used in Dougherty’s 
comparisons. Group tests were also critically reviewed by Gambrill (36), 
M’Graw and Mangold (64), and Sangren (83). A series of short articles, 
dealing with various group tests, scores made, and the interpretation of 
results for classroom use, are given in a volume edited by Bell and 
Suhrie (7). : 

In the field of individual tests, books by Bronner (11), Burt (14), 
Pintner (77), and Wells (106) made important contributions. Particular 
tests were evaluated by Bell and Suhrie (7), Boge (10), Cannon (15), 
Cornell (21), Davey (24), Goodenough (38), Robinson and Robinson 
(82), Spearman (88), and Strasheim (94). 

Conditions other than intelligence which affect test ‘scores were studied 
by Brooks (12), Conrad (20), Fox (33), Spearman (88), Weston and 
English (107), and Wilson (108). This problem was also discussed in 
the report dealing with the Minnesota Mechanical Aptitude Tests (73). 
Marine (65) found that familiarity with the examiner made little difference 
in the test scores of subjects tested individually. 

Practical criteria for evaluating tests were given by McCall and Bixler 
(63). They also presented methods of applying information gained from 
tests in student guidance and personnel administration. Hughes (48) 
pointed out certain relations between general and specific trait charac- 
teristics. Oates (72) developed the age-maturity criterion. 

Graf (41) reported an ingenious method for standardizing tests. Select- 
ing a particular age group—twenty year olds—he so standardized his test 
that an individual’s score could be compared with those of individuals 
having had similar educational experiences. 

A publication by Saunders and Putnam (84) gave a popular presentation 
of intelligence tests together with an explanation of their purpose. An ex- 


cellent manual of tests for French-speaking nations was written by Decroly 
and Buyse (25). 


Verbal Tests 


The psychological examination of men in the national army during the 
World War gave great impetus to the development and use of group in- 
telligence tests. Army Alpha was the first verbal group intelligence test to 
be used widely in the United States. It was soon followed by others; for 
example, the Otis Test in 1918, the Pressey Test in 1918, the Haggerty Test 
in 1919, the Whipple Test in 1919, and the National Intelligence Test by 
affiliated psychologists in 1920. Different forms of the Kuhlmann-Anderson 
Group Test (58) made it adaptable for use in all grades from the first 
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to the twelfth, and for use with adults. Among other well-known tests 
are the American Council Psychological Examination, the Brown Uni- 
versity Psychological Examination (designed by Colvin) , Mentimeters (by 
Trabue and Stockbridge), the Miller Mental Ability Test, Roback Men- 
tality Tests for Superior Adults, the Terman Group Test of Mental Ability, 
Thorndike Intelligence Examinations, Thorndike’s CAVD Scale, and the 
Thurstone Psychological Examination for College Freshmen. Thorndike’s 
CAVD Tests are graded to fit various levels of development. They are 
more extensive than the majority of ‘test series developed for practical 
use. 

A group intelligence test was developed by Cattell (16) in England, 
using tests that correlated well with the “G” criterion. Davey (24) also 
developed a test series in England. She recommended the use of pictorial 
tests. Siegvald (85) described another series of tests developed in Europe 
and reported their successful use in schools. 

A number of tests involve both the measurement of intelligence and that 
of achievement in school subjects. Examples of such tests are the Illinois 
Examination, the Otis Classification Test, the New Jersey Composite Test, 
and the Pintner Mental-Educational Survey Test. 

The most widely used of the verbal individual tests were cited above in 
connection with the work of Alfred Binet. They are, for the most part, 
revisions and adaptations of the Binet Test. 


Non-Verbal Tests 


Group tests of a non-language type have been developed for use with 
children too young to read and for use with illiterates or handicapped 
readers. As in the case of verbal tests, the first widely used non-language 
test, Army Beta, was developed for use in the national army. Other non- 
verbal tests soon appeared; notably those by Myers, Pintner, Thorndike, and 
Dearborn. Myers’ Mental Measure was a non-language test designed for 
persons of all ages. The Pintner Non-Language Test was designed for 
children in the upper grades who could not be tested fairly with verbal 
tests because of language handicaps or other disabilities. The Thorndike 
Non-Language Test was designed for adults. Dearborn’s test was developed 
for use with children too young to read. 

A number of tests designed for young children are not entirely non- 
verbal, particularly when they are intended for primary children at various 
age and grade levels. Since they make large use of non-language technics, 
however, they are summarized at this point together with those that are 
strictly non-verbal in character. 

Goodenough (39) developed a single test for the measurement of in- 
telligence through children’s drawings. The drawing of a man is rated on 
the basis of the presence or absence of details according to standards de- 
veloped. Van Alstyne (104) constructed a picture vocabulary test for pre- 
primary children. Reports on other tests for young children were given by 
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Baldwin and Wellman (4), Bayley (5), Goodenough and Anderson (40) 
Hallowell (44), Hutt (51), Linfert and Hierholzer (62), Murchison 
(67), and Newell (71).* Test series designed for a somewhat larger age 
and grade range were reported by Bihler (13), Cunningham (23), Gesell 
(37), Kuhlmann (57), and Stutsman (95). California, Minnesota, and 


Stanford universities are working on thorough standardizations of new test 
series. 


b 


Among other tests for pre-school and primary children are the Cole- 
Vincent Group Intelligence Test for School Entrance, the Detroit First 
Grade Intelligence Test, Haggerty Intelligence Examination, Delta 1, 
Myers’ Pantomime Group Intelligence Test, the Otis Group Intelligence 
Scale, Primary Examination, and the Pintner-Cunningham Primary Mental 
Test. 

The paper non-language tests, as illustrated in the one developed by 
Baker (3), provide a method of measurement for children who cannot be 
tested in the regular manner. Work on an international test, a non-verbal 
test sponsored by the National Research Council, represented a new de- 
parture in test development. Squires’ (92) work on a universal test was an 
interesting attack on the same problem, namely, that of measuring the 
intelligence of various national groups with a non-language test. The 
Michigan Non-Verbal Test (42) was published in 1930. 

A non-verbal system of tests suitable for persons from six years of age 
to adulthood was worked out by Vabalaz-Gudaitas (103). The entire series 
makes use of the same materials and requires simple motor responses. 
The authors’ analysis of the functions tested differs from analyses com- 
monly given in the United States. Discussion of the results by American 
students may lead to different conclusions. 

Many of the non-verbal tests make no use of pencil and paper. These 
tests, which are called performance tests, give the subject materials to 
manipulate. Illustrative tests of this type include the Arthur Point Scale, 
re-standardized in 1928 (2), the Dearborn Formboards (11), the Fer- 
guson Formboards (11), the Healy-Fernald Test Group (11), Kelley’s 
Construction Test (52), Kent and Shakow Formboards (55), the Knox 
Test Group, including the Knox Cube Test (34), the Kohs Color Cubes 
(56), the Lincoln Hollow Square (61), the Pintner-Paterson Performance 
Tests (79), and the Porteus Maze (80). 

Drever and Collins (30) developed a performance test for use in measur- 
ing the intelligence of the deaf. They found that the deaf did better on the 
test than normal children selected for comparison. 

Recently Arnstein, Hertzer, and Kusching (1) and Blacking (9) worked 
on the standardization of tests of bead stringing. Beigel (6), as shown in 


1The writer is indebted to Dr. Bessie Lee Gambrill for access to an unpublished 
bibliography covering recent research in child psychology. 
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a publication from Utrecht, utilized performance tests to measure the 
ability to combine or integrate. Boge (10) used performance tests to 
analyze the kind of “practical intelligence used in the solution of every- 
day problems.” 

An earlier report by Gaw on the vocational uses of performance tests 
was continued in a publication by Earle, Milner, and others (31). Pater- 
son (73) reported results obtained with the Minnesota Paper Formboards 
Test. General discussions of the use of performance tests are those pub- 


lished by Murdoch (68) and Turner (102). 


Test Terminology 


Terms used in connection with intelligence and its measurement have been 
somewhat more clearly defined during the past few years. An article by 
Warren and others (105) gave an analysis of psychological terms which 
was useful in clarifying a number of them. Another useful publication 
was the Dictionary of Psychological Terms prepared by English (32). 
It is intended to give the layman and beginner an understanding of the 
more common meanings attributed by psychologists to words used techni- 
cally. Although it does not satisfy all schools of psychological thought, 
it is well edited. Thurstone’s (100) discussion of mental age helped to 
clarify the meaning of that term. Spearman (90) discussed an interesting 
issue with regard to our psychological vocabulary. 

In spite of the useful helps cited above, psychology suffers from the 
numerous confusions in meaning which arise. Many authors find it neces- 
sary to explain and define their terms and concepts before reporting ex- 
periments and interpretations. Thus, Greene and Jorgensen (43) have a 
glossary in their text, and Kuhlmann (58) defined his own terms and 
explained a number of concepts not in ordinary use. 











CHAPTER II 


Special Abilities and Disabilities and 
Their Measurement 


Aurnovcu major emphasis has been placed on tests of general intelligence 
during the past few years, considerable attention has been given to the de- 


velopment and evaluation of tests of particular abilities. This chapter deals 
with the nature and measurement of special traits. 


Nature of Special Traits 


Kelley (154: 8) pointed out that it was essential to the growth of scien- 
tific psychological knowledge that we know with increasing precision facts 
about special traits and their relation one to another. In his work, he was 
primarily interested in the relations between traits. Through a study of these 
relations he endeavored to differentiate traits which were independent but 
whose independent existence had not been proven. The higher mental proc- 
esses fell within the scope of his studies. By statistical analysis Kelley con- 
cluded that several factors were common to two or more traits but that they 
were not necessarily the same in all the traits. He set up for further study 
the following factors: maturity, sex, race, verbal facility, perseveration, 
and oscillation. His proposed method of attack was to gather all available 
facts concerning a certain test population and then to work out in detail by 
statistical analysis the relation between the several traits. This method has 
much to commend it although at the present time it may not be practical. 
There is some question as to whether we can measure with sufficient accuracy 
all the factors entering into the performance of a test population. 

The precise work of Dodge (129) was concerned only with a few sub- 
jects carefully trained for especial observations. His laboratory technics 
were exemplary, while still more important were his masterly analyses and 
syntheses dealing with the theory of special traits. Were all our data for 
different aspects of behavior as carefully collected and as well organized 
theoretically, statistical measures could be applied, and whatever relations 
were found among the traits could be interpreted with assurance. Significant 
improvements in technics of measurement indicate that in the future the 
technic of statistical analysis will have increasing value. 

Many attempts have been made to measure traits as though they were 
independent. The skill with which the studies have been set up has varied 
with the workers’ understanding of the interrelations of traits. Considerable 
confusion as to just what was measured has often resulted. Frequently the 
bit of mental life considered as a special trait has been abstracted for some 
special purpose, or for some particular theoretical or practical need. The 
measurement of the trait as independent has been declared successful in 
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terms of its accuracy in meeting that need, regardless of any proof as to the 
independence of the trait itself. 

As technics for trait measurement and genetic studies develop, those 
which are discriminative can be selected pragmatically. Traits can then be 
studied by statistical methods alone or in combination. Pioneer studies of 
that type were made by Courtis (124), Kelley (154, 156), Paterson (186), 
Spearman (199, 200), Thomson (201, 202), Thorndike (203), Thurstone 
(205), and Toops (207). 

The meaning of special traits and the relation of special traits to general 
ability is discussed in Foundations of Experimental Psychology, a book 
edited by Murchison (183). Liebman’s (170) work, general in character, 
is an interesting theoretical discussion. Both Dearborn (127) and Scheide- 
mann (191) considered the nature of special ability in their books. The 
latter was more concerned with tests and with a practical exposition of our 
present knowledge. Articles by Cannon (119), Hughes (145), and Robin- 
son (188) reflect the widespread interest which exists in interrelations 
between specialized traits and general ability. They also describe useful 
technics. 

Measurement of variability in traits, the relation of variability in one 
trait to variability in another, and the significance of variability as a mental 
characteristic have all been studied. Commins (122) brought out clearly 
the importance of studying test scores to secure data on the variability of 
the subjects. Foran, Lillis, and O'Leary (133) studied trait variability. 
Methods of expressing variability were studied by Thurstone (206), as 
well as by Anderson (111) and Heinis (141). Variability as a character- 
istic of individuals, or types of individuals, was studied by Antipoff (112), 
Brown (117), Hull (147), Kelly (154, 155), Robinson (188), Wallin 
(209), and Woodrow (211). The results were somewhat contradictory. 
Hull found that the distribution of measures of variability for a given in- 
dividual approximates the normal frequency curve, although individuals 
differ greatly with respect to the range of the variabilities. Brown found 
about equal variation in trait measures for bright and dull persons. Wallin 
found that the normal were more variable but in Woodrow’s study the 
normal group showed less variability. 


Measurement of Special Traits 


Because of differences in the purposes and methods of the experimenters 
and because traits were defined differently by various workers, it is difficult 
to compare trait measures and indeed to identify them accurately. The field 
of trait measurement overlaps that of achievement testing and, to some 
extent, that of general mental measurement. However, collections of trait 
tests were brought together and presented by Bronner (116), Hull (146), 
Paterson (186), and Wells (210). Wells was concerned chiefly with meas- 
ures of ability appropriate for clinical use regardless of their implications 
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concerning the nature of traits. Bronner listed a number of separate tests 
and test series and cited references dealing with them. Hull outlined and 
interpreted the needs in aptitude testing and summarized the work already 
done in that field. Paterson discussed the theoretical and practical implica- 
tions of measurement of special traits. 

Musical abilities were analyzed by Seashore (194). Careful diagnostic 
work was also done by Kwalwasser (167). Fracker and Howard (134) 
reported a low relationship between intelligence and musical talent among 
college students. McCarthy (174) confirmed Seashore’s findings as to the 
accuracy of the Seashore Tests, in the measurement of musical ability. 

Meier (178, 179, 180) reviewed the literature on artistic talent and 
reported original work in the measurement of artistic ability. About the 
same time, Goodenough (139) studied children’s drawings as an index of 
maturity. More recently, an excellent series of tests of artistic ability was 
reported by Lewerenz (168, 169). He found little relation between test 
scores alone and ability as measured by art schools. 

The eidetic phenomenon * has attracted much attention since Jaensch’s 
(149, 150, 151, 152) first work in the field. In addition to his studies, 


work was done by Cramaussel (125), Garfunkel (136), Gatte and Vacino. 


(137), Gengerelli (138), Hansen (140), Joesten (153), Kiesow (159, 
160, 161), Kliiver (162, 163, 164), and Zeman (212). 

Comparative and genetic studies of children’s development by means of 
standardized tests, or by group tests originated by the author, have been 
common. Several of these, concerned chiefly with establishing central ten- 
dencies, appeared in Japan. The investigations of Kido (158), Kuwata 
(166), and Narasaki (184) showed this trend. 


Vocabulary was used as a measure of intelligence or learning ability, by 
Conrad (123), Cuff (126), Dolch (130), Schneck (192), and Sham- 
baugh (195). Their results did not give a final answer to the question as 
to the relation between verbal ability and general intelligence. 

Beck (113, 114), Kovarsky (165), and Loosli (172) reported the 
technic employed by the Rorschach Profile Test and analyzed the theory 
underlying the test. Rossolimo (190) developed another profile test de- 
signed to measure intelligence and other personality traits. Both tests have 
been widely used. 

Intelligence and achievement tests have been utilized in studying the tests 
themselves and in determining the effects of learning. DeWeerdt (128) and 
Slocombe (197) analyzed, under controlled conditions, the effect of differ- 
ent amounts of learning on test scores. They concluded (1) that there was 


2 Bidetic images are subjective visual phenomena resembling after-images. An eidetic 
individual is not only able to imagine an absent object but also to see it, either when he 
closes his eyes or when he looks at some surface which serves as a convenient back- 
ground. Bidetic images differ from after-images in that an object may be seen after a 
considerable period of time has elapsed since its removal—perhaps several days. The 
individual who possesses eidetic images is in general a normal and healthy person. 
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little effect from similar learning, and (2) that the best way to eliminate 
uneven effects of previous direct learning on test scores was to give adequate 
fore-exercises, or to repeat the examination. 

The relation of motor ability to general intelligence was studied by 
Piaget (187). He reached no statistically verifiable conclusions. Hindman 
(143) reported that, in a certain university, those with better motor skill 
made slightly lower scores. Results from a general series of motor tests in 
relation to sensory traits and intelligence were reported by Sommerville 
(198). The test series developed by Oseretzky (185) was also reported 
by Kernal (157). Miles (181) developed the Pursuitmeter, a motor test 
that has been widely used as a measure of skill in simple coordination. 
Brace (115), Garfiel (135), Schultz (193), Shevalev (196), and Walker 
and Weedon (208) were all concerned with developing tests of motor 
skill. Likhacheva (171) studied motor ability in children from the point 
of view of physiology. 

Association as a mental phenomenon has been studied by means of asso- 
ciation tests of different types. Malmud (177) expressed a preference for 
controlled association tests because of their greater accuracy. Others pre- 
ferred the free association tests. Rosanoff (189) published a reprint of the 
Kent-Rosanoff Test. Cason (120), Conrad (123), and McFadden (175) 
utilized free association tests in their analyses of adjustments and psycho- 
logical types. 

Immediate memory was studied by Fischler and Ullert (132) and by 
Dodge (129). 

Cleeton (121) summarized the literature dealing with experimental 
work in the field of reasoning. After that summary was published, impor- 
tant work was done by Alpert (110), Burt (118), Hicks (142), Huang 
(144), Lorimer (173), Maier (176), Moore (182), Piaget (187), and 
Thurstone (204, 206). Erismann (131) made a genetic study of reason- 
ing, using reasoning tests. Isaacs (148) refuted Piaget’s assumptions as to 
the growth of reason in the child. 
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CHAPTER III 


The Nature and Extent of Individual Mental 
Differences 


Tue nature and extent of individual differences are dependent in part 
upon the growth of intelligence, upon the age of mental maturity, and upon 
the distribution of intelligence in the general population. These topics, 
together with the effects of sex, race, and social status upon individual 


differences, will be discussed in this chapter. 
Growth of Intelligence and Adult Mental Age 


There has been a great amount of controversy as to the exact age at 
which mental growth ceases. This problem first became troublesome when 
Terman (257), in computing the I. Q., found it necessary to select some 
chronological age for the adult level beyond which no increase in mental 
growth was expected. Upon the basis of his adult samples, he selected 
sixteen years as the point of adult maturity. More complete samples of 
the adult population as disclosed by the testing program of the national 
army during the World War showed that adult white drafted soldiers 
averaged no better in general mental level than did school children between 
the ages of thirteen and fourteen years chronologically. Acute differences of 
opinion arose over whether or not the army draft represented an adequate 
adult sample. 

This problem is not simple. Discussions of the various factors which 
contribute to its complexity were presented in texts by Dearborn (223), 
Freeman (225); and Thorndike (259). Dearborn showed that the con- 
tent of intelligence tests is so similar to that of scholastic tests that adults 
removed from the immediate school environment temporarily forget fac- 
tual material. Hence adults appear to be no more intelligent than pupils 
in the sixth grade. Freeman presented and discussed the data available 
prior to the publication of his text in 1926. He also presented theoretical 
hypotheses as to the relation between level of intelligence and age of 
maturity, as follows: (1) that all individuals may develop and reach 
maturity at about the same age; (2) that individuals with high I. Q.’s 
may reach maturity sooner than those with the lower ones; and (3) that 
the brighter children may reach maturity at a later time than those with 
lower I. Q.’s. Data on the growth curves of the mentally subnormal are 
becoming available in clinics in rapidly increasing amounts. The phenome- 


non of the falling I. Q. with increasing chronological age among sub- 


normal children has been recognized for some time. Using data from the 
files of the Detroit Psychological Clinic, the writer computed I. Q.’s for 
several hundred subnormal children and compared them with earlier 
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|. Q.’s before the chronological age of sixteen years was reached. In divid- 
ing mental age by sixteen the I. Q. tended to fall; in dividing by fourteen 
there was a slight rise in the I. Q. At the age of approximately fourteen 
years and four months, the I. Q.’s tended to remain constant. Fewer data 
are available relating to average or superior children. For subnormals, 
the evidence indicates that maturity is reached soon after the age of fourteen 
years. Thurstone and Ackerson (261) reported on the Binet M. A. of 
4.208 white children with chronological ages from three to seventeen 
years. The mental curve had a positive arc to the age of ten years, then an 
inflection point between nine and twelve years which become asymptotic 
to the adult level. They concluded that the adult level was reached earlier 
by bright pupils than by dull ones. 

-Thurstone (260) reported on the fallacies of using the mental age con- 
cept at the adult level and proposed the use of percentile scores. These, 
he believed, could be compared with greater accuracy and with them more 
meaningful standards could be developed. 

Thorndike (259) discussed the difficulty of differentiating between the 
materials in intelligence tests which measure maturity and those which 
measure training. He pointed out that individuals may cease to improve 
in altitude upon the CAVD Test and still continue to improve in such things 
as business, child management, and social arrangement. He suggested that 
the units of measurement used in intelligence tests may readily lead to 
faulty conclusions. He pointed out that the methods of partial and multiple 
correlation must eventually be employed with significant data to differen- 
tiate the two factors, maturity and training. 


Individual Differences 


Pioneer work in the field of individual differences was done by Galton, 
Cattell (218), and Pearson. The first systematic presentations were made 
by Stern (253) in 1900 and by Thorndike (258) in 1903. A tremendous 
amount of literature has appeared since that time. An increasing recogni- 
tion of the importance of individual differences is one of the distinguishing 
characteristics of educational theory and practice during the past twenty- 
five years. The determination of individual differences as accurately as 
possible and the guidance of individuals into the vocations for which they 
are best fitted have come to be regarded as important educational problems. 


Early investigations showed that the distribution of differences in mental 
traits covers an enormous range. A marked looseness in the interrelation- 
ships between traits was disclosed. These first studies also dealt with the 
effects of race, sex, family, maturity, and training in producing differences. 
In more recent years attention has been centered on the improvement of 
tests and measurements, on refinements in statistical methods of determin- 
ing differences, and on more rigorous control of the factors of selection 
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and training in the groups tested. A decidedly more critical attitude toward 
the principles and technics of mental measurement has arisen. 

A widely accepted doctrine in individual psychology isthat the distribution 
of individual differences in all mental traits conforms approximately to the 
normal probability curve. Boring (216, 217) critically and vigorously 
attacked this doctrine and expressed skepticism as to the importance of 
the discovery of the applicability of the normal distribution to mental 
traits. He contended that strictly unselected groups were never available; 
that differences might be due to inequalities of training, to inequalities in 
units of measurement at different ranges and levels, and to errors in 
measurement. Kelley (238) replied to Boring at length and suggested four 
units in mental measurement: (1) the sense difference unit; (2) the vari- 
ability in performance unit; (3) the group variability unit; and (4) the 
unit resulting in the simplest picture of interrelationships. Thorndike 
(259) recognized the uncritical character of the a priori assumption of a 
normal distribution, but realized also the great importance of the problem 
for present and future practice in mental measurement. He demonstrated 
that the distribution of general intelligence for children in the sixth to 
the twelfth grades approximated the normal curve when three precautions 
were observed; namely, when equal units of measurement were employed, 
when inequalities in training and the influence of selection were eliminated, 
and when errors in measurement were reduced to the minimum. 

Sex differences—The belief prevails in popular opinion that sex differ- 
ences are highly significant in psychological measurement. Objective data 
available to date do not confirm this hypothesis. Freeman (225) sum- 
marized the earlier studies and reached this conclusion in substance. 
Whipple (264, 265) reported two studies of sex differences in general 
intelligence. Among eleven-year-old children in the elementary school, girls 
surpassed boys by approximately seven months in mental age according to 
their scores on the National Intelligence Test. He also reported that at the 
high-school level, on Army Alpha, boys were slightly superior to girls. 
Winsor (266) reviewed a bibliography of forty-nine titles on sex differ- 
ences and concluded that the two sexes are equally variable whenever signifi- 
cant numbers are tested at any given age. In the opinion of Book and 
Meadows (215) the superiority of girls over boys from ages nine to six- 
teen is probably related to the recognized accelerated physical development 
of girls over the rate for boys during the period of adolescence. Investiga- 
tions of sex differences with respect to mirror drawing, as reported by 
Clinton (219), showed that boys excelled girls up to thirteen years of age 
and that after that point the girls excelled the boys. Evidence from many 
sources has shown that in general or special mental ability sex differences 


are usually too insignificant to warrant the establishment of separate sex 
norms. 
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Race differences—lIn recent years race differences have also been widely 
investigated by means of psychological tests. More than fifty articles in 
periodical journals reported research studies in this field in the past four 
years alone. In the opinion of the committee many of these studies do not 
adequately sample the racial groups. It has often been assumed that immi- 
grant racial groups are definitely inferior to the native stock. The chief 
reasons for this belief may be easily identified. In the first place, language 
difficulties have set up barriers and often caused these groups to do poorly 
on verbal tests. Second, although our customs and laws are different from 
theirs and are not easily understood by immigrants, those who do not readily 
conform are judged to be inferior. Finally, an immigrant population may 
be an inferior and inadequate sample of its own ancestral stock because of 
social and economic selection. In most of the investigations thus far re- 
ported psychological test scores for the various racial groups have been 
compared with norms for the native population. Murdoch (245) used the 
Pressey Group Intelligence Test with twelve-year-old children in New York 
and Honolulu. In New York, where native born white children excelled, 
Jewish and gentile groups ranked about equally high, negroes were next 
in order, and Italians were lowest. In Honolulu, north Europeans and 
Americans were highest; these were followed by Chinese, American- 
Hawaiians, Japanese, Koreans, Chinese-Hawaiians, Portuguese, Hawai- 
ians, Porto Ricans, and Filipinos. Hybrids fell between the two parent 
stocks. Porteus and others (248) in a similar study found essentially the 
same gradations of intelligence in these racial groups. Goodenough (229) 
tested 2,457 young children with the Goodenough Intelligence Test for 
Young Children which is independent of language. Her work was per- 
formed in southern and western states. In order to balance social and eco- 
nomic status as nearly as possible she did not include children from supe- 
rior American homes. Nevertheless she found that the negroes and south 
European stocks were distinctly inferior. 

Several investigations of the intelligence of negro groups have been made. 
These tend to show inferiority to American whites in practically all in- 
stances. On the contrary, Sunne (256) concluded, after some experimental 
testing, that there were no race differences between whites and negroes which 
could not be accounted for through differences in educational opportunities, 
and Petersen and others (247) reported that negro children excelled whites 
in special tests of retention and memory. Garth and others (228) and 
Graham (232) reported minor differences in young children, but differ- 
ences increased markedly with age in favor of the whites. Strachan (254), 
using the Stanford-Binet on kindergarten and primary children in Kansas 
City, found negro inferiority among these younger pupils as well as among 
the older ones. Price (249) and Graham (231) reported deficiencies of 
ten points on the Otis Self-Administering Test for negro college freshmen. 
Sixty-three percent of the whites surpassed the negro median in intelligence. 
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Reports on Indian children tend to show an inferiority on the part of this 
racial group. Jamieson and Sandiford (235) found median I. Q.’s on south- 
ern Ontario Indians as follows: National Intelligence Tests, 80; Pintner- 
Cunningham, 78; Pintner Non-Language, and Pintner-Paterson, 97. Black- 
wood (214) concluded that Indian mental inferiority is partly due to lack 
of knowledge of the civilized conditions which mental tests presuppose. 
Travel seems to produce and stimulate culture among Indians, for Garth 
(226) discovered that 67.3 percent of the sedentary Indian groups fell 
below the median of the nomadic groups. Klineberg (240) found that the 
Indians were relatively better on performance tests than on verbal tests, 
which suggests language difficulty and lack of educational opportunity. 
Mixed-blooded Indians, as reported by Garth (227), tend to be slightly 
higher in intelligence when the white factor is more predominant. 

A study of Oriental races reported by Wen (263), contains an interesting 
résumé of a testing program carried out with Chinese children. He also 
summarized the work of eight American psychologists. Wen found no 
marked evidence of racial inferiority or superiority. Surveys of Japanese 
children by Darsie (221) indicated weakness in those mental processes 
which involve memory and abstract thinking in the English language, but 
superiority to American children in equivalent tests with non-verbal 
material. 

Social and economic status—Considerable evidence was derived from the 
army tests indicating a high correlation between intelligence and social and 
economic status. More complete analysis of the data, as reported by Lehman 
and Stoke (241), showed that fully one-half of the A and B caliber men 
were drawn from the non-white-collar occupations. Two studies of the in- 
telligence of school children and the occupational status of their parents 
were reported by Collins (220) and Goodenough (230). The traditional 
excellence of the professional groups was reflected in the scores of their 
children on the mental tests, but there was considerable overlapping of the 
two distributions. This precludes the idea of a complete class separation. 
Van Dael (262) reported similar findings from a study in the Netherlands, 
as did Kirahara (239). from a study of elementary-school children in 
Japan. Studies based on large numbers of cases including the entire popula- 
tion of communities are quite lacking. 

Shimberg (251) gave two standardized tests to urban and rural children, 
and although the rural children made a poorer showing, the difference was 
attributed in part to the fact that the test material was more favorable to 
city than to rural children. Hirsch (234) measured 1,845 school children 
in the Kentucky mountain districts. He found an average I. Q. in the high 
70’s and a negative correlation between I. Q. and C. A. The latter suggested 
the presence of poor environmental factors operating in the case of older 
children. Hatcher (233) reported that fewer than one-fourth of the moun- 
tain children in a typical mountain school in Virginia were of normal in- 
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telligence. Russell (250) reported a study of gifted children in rural Eng- 
land showing that fully 50 percent of the group were recruited from the 
superior social classes. He also reported slightly higher mental levels when 
the parents did not both come from the same communities. Marsden (243) 
found fewer bright children, and more average children, and about the same 
proportion of slow children in remote country schools where many children 
were suffering from lack of more adequate educational facilities as in the 
total group in two complete county surveys. 

The Twenty-seventh Yearbook of the National Society for the Study of 
Education (246) dealing with the influence of nature and nurture presents 
conflicting evidence as to the influence of nurture. In this yearbook Rogers 
reported no gain in the I. Q. of sixty-four girls taken from poor economic 
levels and placed in a well-managed institution. On the other hand, Free- 
man reported an increase of ten points in the average I. Q. of children placed 
in foster homes. Orphan children were found by Davis (222) to have 
median I. Q.’s of about 85. Jones and Carr-Saunders (236) reported that 
English orphan children of low status seem to improve mentally, and 
superior children to deteriorate, when placed in orphanages. 

Moss and Hunt (244) analyzed seven thousand scores on the Washing- 
ton Social Intelligence Test. They discovered that business executives make 
higher ratings than all others, and that women show higher ability in mat- 


-ters of tact and recognition of behavior than men. Strang (255) reported 


on the use of the same test. She believed that abstract intelligence is meas- 
ured largely by this test, and that otherwise, the test measures only a small 
residue of social intelligence. 

A study by Lentz (242) on the size of family and I. Q. showed a lower 
median |. Q. for the larger families. There was but one exception, and the 
discrepancy in that case was not statistically significant. From the study of 
6,790 children from 2,712 families Steckel (252) reported a small but 
consistent superiority in the intelligence of younger children over that of 
older children in the same families. Arthur (213) found a similar trend 
among immigrant children to the United States with a significant statistical 
difference of six points in I. Q. In contrast to these findings are those of 
Jones and Hsiao (237) who found no significant correlation between in- 
telligence and birth order in 614 pairs of siblings. If careful records are 
maintained for a few years, very significant data will be available from 
many of the mental clinics on the intelligence of parents and their children. 
Many children will be studied whose parents were also examined in their 
childhood. These studies will take into account such factors as social and 
economic status and will throw important light upon many of our present 
problems. 








Summary 


The age at which mental maturity is reached is still an unsolved problem 
but the evidence points to earlier cessation of growth than was at first be- 
lieved. Distributions of general intelligence as well as most special abilities 
seem to follow the normal frequency curve. Sex differences have not proved 
to be important enough to warrant the establishment of separate norms. 
Race differences have not been accurately determined because of the in- 
fluence of special selective factors and the inability of investigators to 
equate completely the different cultural patterns. Negro and Indian groups 
seem to be inferior to native white stocks according to present tools of meas- 
urement. Studies of social and economic groups show a great overlapping 
in intelligence and a fairly high positive correlation between educational 
and cultural opportunities and intelligence. 
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CHAPTER IV 


The Construction and Statistical Interpretation of 
Psychological Tests 


NVESTIGATIONS into the nature and function of psychological tests have 
been greatly facilitated and advanced by the discovery of sound principles 
of test construction, by the use of better methods in the selection of test 
items, and by improvements in the various kinds of statistical interpreta- 
tion. A few comments on these topics are presented here with their par- 
ticular applications to psychological tests. 


Selection of Test Items 


Terman (301) gave one of the earliest detailed descriptions of the 
construction and scaling of psychological tests in reporting the procedure 
used in standardizing the Stanford Revision of the Binet Test. More recently 
Freeman (277) and Thorndike (302) brought together the best in cur- 
rent theory and practice. Among other questions, Freeman (278) analyzed 
power and speed in tests; and pointed out that if a test is given with time 
limits and again without, and that if a high correlation is found between the 
two results, the test is primarily one of power. Kuhlmann (286) dis- 
covered that if test items were too easy, the problem of proper motivation 
entered in, disturbing the optimum measurement of intelligence. When 
the items are too easy, test results indicate chiefly speed of automatic pro- 


cesses. Symonds (299) presented six criteria with respect to the proper 
difficulty of test items: 


(1) The items with which one can measure the ability of an individual most 
accurately are the items that he can do with 50 percent accuracy. 

(2) The test which measures an individual most accurately is one made of items, 
all of which the individual can solve with 50 percent accuracy. 

(3) The best item for measuring two individuals is the item lying in difficulty mid- 
way between the difficulty of the two items which can be answered with 50 percent 
correctness by each of the individuals. 

(4) The best test for measuring two individuals is one composed of items as in (3). 

(5) The best test for measuring a typical school grade or class is a test in which 
all of the items have a difficulty such that they can be answered with 50 percent 
accuracy by the average individual of the group. 

(6) The best test designed to measure several consecutive grades or classes is one 
in which the items have been so selected that they range evenly in difficulty from the 
level of difficulty which can be done with 50 percent accuracy by the average member 
of the lowest group to be tested, to the level of the difficulty which can be done with 
50 percent accuracy by the average member of the highest group to be tested. 


Clark (269) proposed a plan for evaluating an individual test item, 
not only on the basis of its difficulty, but also in terms of whether it is 
correctly answered by a greater proportion of good students or poor stu- 


295 





dents. Obviously a test item lacks validity if it is correctly answered by as 
many of those who rank low as of those who rank high with respect to the 
ability which it purports to measure. 


Test Construction 


Among the most common types of test items are the completion exercise, 
multiple-choice exercise, and the true-false question. In a comparison of 
true-false and completion tests by Shulson and Crawford (297), the 
latter was as valid as the former on material which was relatively un- 
familiar. The completion test lacked objectivity, but the true-false test 
involved guessing. Hanumantha and Gopalaswami (281) reported that 
in retesting adults with a four-response multiple-choice group test, change 
was made from wrong answers to right ones about twice as often as the 
opposite condition, whereas children change with equal readiness in both 
directions. Fritz (279) showed that when very difficult material was 
presented 62 percent of the items were marked true and 38 percent false, 
although in reality half the items were true and half were false. Lehman 
(289) showed that on a retest superior students make fewer reversals 
and better scores on true-false tests, whereas poorer students make more 
reversals and lower scores. Mathews (290, 291) showed that if alternate 
answers such as “more” or “less” are printed one above another there is 
a tendency to select the upper of the two answers. Likewise, he found that 
if two answers such as “yes-no” are arranged in the order shown here, 
there is a 3.2 percent greater tendency to mark the left answer than the 
one at the right. He also found (1) that changing answers on true-false 
tests raised scores in 63 percent of the cases, lowered them in 34 percent, 
and on 3 percent had no effect; and (2) that changes in multiple responses 
raised scores for“53 percent, lowered them for 21 percent, and produced 
no change in 26 percent of the cases. Arnold (268) found that unless a 
false statement was very ridiculous, there was a stronger tendency to mark 
false statements true than to mark true statements false. 


Scores and Norms 


One of the most concrete and complete discussions of scores and norms 
was offered by Freeman (277). Without doubt variations in types of mate- 
rial, time limits, difficulty of items, and types of response materially affect 
norms; yet from norms conclusions are sometimes drawn as to the com- 
parative intelligence of different age, race, and social groups. 

Pintner (294) reported that the students in a graduate class assigned 
scores ranging from 34 to 85 to the group intelligence booklet of one pupil, 
exclusive of two freak scores of 3 and 18. Training reduced this range from 
58 to 90 on a second similar booklet. The average difference from the true 
score was reduced from 9 points to 3 points through training. Dearborn and 
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Smith (273) rescored more than five hundred Dearborn tests and found 
many disturbing factors causing particularly persistent underscoring. 
Whether to weight items of varying degrees of difficulty or to assign equal 
values to all items regardless of difficulty is a problem which is still open to 
further investigation. Accidental success on a few of the more difficult items 
by relatively poor subjects tends to neutralize the beneficial effects of weight- 
ing the more difficult items. Kuhlmann (287) proposed an age norm for 
each trial of each test instead of the customary summation of total scores 
on the entire test battery. Thurstone (305) criticized Thorndike’s assump- 
tion that standard deviations by grades were equal and showed that by 
projecting the elements of the test upon a true scale the wide gaps occurring 
in the upper grades may be explained. Ellis (275) suggested that the effect 
of speed on results obtained with timed tests may be measured by giving an 
alternate form of equal difficulty as an untimed test and computing an index 
of speed from the difference in the two scores. 


Wallin (306) reported on the standardizatioi of the items of the Stan- 
ford-Binet Test, ages eight and nine, using the scores of 1,382 children. He 
found no significant sex diflerences. He reported that the eight-year tests 
discriminated between normal children of seven and nine years, but that the 
nine-year tests only partially discriminated between ages eight and nine. 
In the nine-year tests the life experiences of the older and duller groups 
seemed to compensate for brightness. He concluded that the tests on any one 
age level were not equally difficult for any one “brightness” classification. 


Abelson (267) reported on the improvement of intelligence testing. He 
calculated college success criterion scores by the T Scale technic for the 
Thorndike, Roback, Brown, and Thurstone Tests. The results were less 
promising than was hoped on account of the relatively low validity. A use- 
ful bibliography of forty-five titles is appended to this study. Thurstone 
(303) proposed that the absolute zero in intelligence measurement be 
defined as a certain distance below the mean performance of any age group 
in terms of its own standard deviation. He contended that with uniform 
testing conditions the relative variability of absolute test intelligence of 
different age groups is constant. Thurstone (304) also proposed a method 
for combining a large number of scores secured on various tests into a single 
composite score. The principle of scoring involved is that a valid score on 
a series of tests has above it as many successes as there are failures below it. 


Cole (270) constructed a conversion scale for comparing scores on three 
secondary-school intelligence tests using the scores of 6,550 pupils from 
fourteen preparatory schools and employing Holzinger’s (283) transmuta- 
tion formula. 
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Test Reliability 


The reliability of psychological tests has been quite extensively investi- 
gated in the past few years with the correlation technic. Rugg and Colloton 
(296) reported on earlier studies, and more recent studies have been made 
by Hildreth, Lincoln, Randall, and Slocombe. Pintner (295) reported on 
the standardization of the Pintner-Cunningham Primary Test with norms 
based on 29,533 children. The reliability coefficients ranged from .72 to .85. 
Cowdery (271) reported the administration of the Thorndike Intelligence 
Examination with repeated tests at intervals ranging from a few days to 
one, two, and three years. He found a declining reliability coefficient, from 
.80 to .648, with the longer time intervals between tests. He interpreted his 
results as due to changes in attitude and to the varied educational experi- 
ences of the subjects in the interim. Kornhauser (285) and Lanier (288) 
presented further studies of the reliability of tests. Symonds (300) dis- 
covered the following factors which make for greater, reliability in a test: 
(1) many test items, (2) a long time limit, (3) a narrow range of difficulty, 
(4) few interdependent items, (5) little operation of chance, (6) correct 
scoring, and (7) objective scoring. Edgerton and Toops (274) derived 
tables for predicting validity and reliability coefficients of a test when it is 
lengthened. Holzinger (283) evaluated the well-known Spearman-Brown 
formula for predicting the reliability of lengthened tests and also presented 
a formula for predicting their validity. 


Constancy of the I. Q. 


Studies of test reliability helped to create and intensify interest in the 
question: Does the I. Q. remain constant? Investigators who were making 
repeated measurements in order to establish the reliability of a particular 
test were confronted with this challenging problem. Consequently, many 
studies were launched dealing directly with the constancy of the I. Q. 

Matthew and Luckey, in the Twenty-seventh Yearbook of the National 
Society for the Study of Education, (292) reported on thirty-eight children 
whose I. Q.’s shifted more than five points on retests. In all but seven 
cases there appeared to be some unusual factor in the child’s make-up, 
or in the conditions surrounding the tests, which conceivably could result 
in the observed instability of the intelligence quotient. Most of the studies 
which have been reported show correlations of approximately .90 to .95 
between the scores on repeated tests. Dearborn and Long (272) compared 
I. Q.’s at different age levels; and concluded that if the I. Q. is constant, 
then the relative ability of the child is not constant but varies according 
to some unknown law with which the Binet Test happens to be in con- 
formity. Foran (276) offered a supplementary review of the constancy 
of the intelligence quotient involving thirty recent studies. He concluded 
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that the I. Q.’s of the feeble-minded were more constant than were those 
of normal children, and that they were more constant for individuals six- 
teen years of age than for those below the age of sixteen at the time of the 
first test. 


Correlations 


The use of correlations in determining reliability of tests has already 
been noted. Correlation is also an important tool in other phases of psycho- 
logical tests such as evaluating special abilities and disabilities, or study- 
ing group mental measures. Garrett (280) offered a simple yet com- 
prehensive discussion of the various correlation technics. Hull (284) 
emphasized certain facts concerning the uses of correlations in prediction. 
In his opinion, correlations lower than .5 are of no predictive value; be- 
tween .5 and .6 they are possibly useful; between .6 and .7 they are of 
genuine but limited value; between .7 and .8 they are of decided value but 
rarely found; and correlations above .8 are not obtained by present 
methods. 

Partial and multiple correlation will undoubtedly prove extremely 
useful in studying the effects of various factors other than nature which 
contribute to scores on psychological tests, but studies in this field are still 
meager on account of the scarcity of numerical measures for the non- 
intellectual factors which operate on test results. These correlation technics 
have been called into use somewhat more extensively since the publication 
of Spearman’s (298) treatise on the two-factor theory of intelligence. 
Studies by Pearson and Moul (293) and Holzinger (282) illustrate this 
trend. 


Summary 


Test items should be selected according to definite criteria of difficulty 
and validity. Chance affects test results to a considerable degree when 
true-false test items are employed and to a lesser but appreciable extent 
when multiple-choice exercises are used. Interpretations of psychological 
tests should be made, not only on the basis of statistical analyses, but 
also with due consideration for the influence of test items on the statistical 
interpretations themselves. The uses of statistics and complex correlations 
will be enhanced when tests are devised to measure more varied phases 
of intelligence and qualities other than those measured by existing tests. 
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CHAPTER V 
The General Uses of Psychological Tests 


Tue VARIETY of purposes for which psychological tests are used may be 
classified under the heads of either diagnosis or prediction. Even their use 
for purposes of diagnosis may be thought of as a phase of prediction. 
Diagnosis is of no significance educationally unless it is followed by some 
sort of educational procedure based upon the predictive value of the facts 
established. Appraisal and measurement of intellectual status is a necessary 
preliminary to the inauguration of any instructional program, knowledge 
of status per se being of interest chiefly to sociologists and others concerned 
with various aspects of social welfare. Consequently, this chapter places 
major emphasis on the various predictive uses of intelligence tests. The 
discussion begins with a statement of specific uses which the tests may have. 
Studies relating to their use in classification, in predicting success in school 
subjects at the various school levels, and in predicting teaching success are 
then cited. 


Uses Made of Test Results 


McClure (328) studied, by means of a questionnaire, the prevalence of 
psychological services in certain large city public-school systems and listed 
the various ways in which tests were reported to be used in those cities. Most 
important of all, according to this report, was the use of psychological tests 
as a device for sectioning classes or for classifying pupils into homogeneous 
groups for instructional purposes. Other uses listed were determination of 
low mentality for purposes of exclusion from school; vocational guidance; 
recommendation for work permits; adjustment of problem children; selec- 
tion of pupils for special classes; high-school graduation; eligibility of non- 
resident pupils for admission to high school; classification of beginners; 
as an element in promotion standards; appraisal of the curriculum; evalu- 
ating methods of teaching; objectifying standards of achievement; recom- 
mendation of candidates for scholarships; skipping and accelerating pupils; 
promotion of doubtful cases; psychiatric and neurological observation; 
and, as part of the diagnosis of juvenile court cases and cases referred to 
social agencies. This study does not specify in detail the manner in which 
tests are used for these various purposes. There is obviously much over- 
lapping in the functions described. These uses, so far as they represent actual 
and possible uses of tests, may be taken as representative of practice where 
psychological testing is carried on. 
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Use of Tests in Classification 


The use of psychological tests as a basis for sectioning and grouping of 
normal pupils was presented by Rankin (338: 205-10) in an earlier number 
of the Review. The conclusion reached in that analysis concerning the 
practice of grouping was that there is little positive evidence of its value in 
terms of better results achieved. This condition is held to be due to the 
probability that little is done about modification of curricula or methods 
of teaching after grouping has been brought about. This does not necessarily 
reflect upon the theoretical desirability of homogeneous grouping, or upon 
the validity of psychological tests as a sole or contributing device in the 
organization of such groups. Many of the studies bearing upon this question 
were reviewed in the earlier report, but one in particular merits further 
attention. Keliher (325) attacked the whole practice of grouping on the 
grounds that it did not square with an educational theory which conceived 
education as growth and which should take the whole child into account. 
So far as tests are concerned, her point was that existing tests do not measure 
all the abilities which it is necessary to consider in achieving any degree of 
homogeneity which may be used as a sound basis for educational procedures 
affecting the whole personality of the child. She claimed, furthermore, that 
individuals are so specific in their abilities that it is impossible to achieve 
such homogeneity. 

Symonds (342) answered the argument insofar as the use of intelligence 
tests was concerned. His defense, however, was based largely upon the as- 
sumption that the existing school curricula conceived in terms of static and 
extrinsic subjectmatter categories is necessarily the norm against which 
classification devices must be measured. Keliher based her argument against 
grouping largely upon a denial of this assumption. 

In summarizing the studies having to do with bases of grouping, Rankin 
(338: 210) in the earlier Review said: 


The available evidence at the present time suggests that teacher judgment and results 
of intelligence tests are of approximately equal importance for ability grouping, and that 
both should be used in the classification of pupils into homogeneous groups. Other 
factors should undoubtedly be utilized also, and, indeed, different ones are used in 
various places with a degree of success, but none of them is in general use. However, 
where it is possible to group separately in different subjects it is agreed that educational 
tests are of very great value. 


Psychological Tests as a Predictive Device 


The second major administrative use of psychological tests is for pur- 
poses of prediction. Although, as has been pointed out, the use of tests as a 
basis for classification is essentially a phase of prediction, the difference in 
the two uses lies largely in the technics employed. Those who have been 
interested chiefly in the predictive value of tests have usually relied upon 
some form of regression equation, taking account of whatever variables 
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seemed to be significant. Much attention has been given to the possible pre- 
dictive value of simple correlations between intelligence test scores and 
measures of ability in the function for which a predictive device was desired. 
For the most part, correlations between intelligence test ratings and ability 
in other functions, such as scholastic success in high school or college, suc- 
cess as teachers, persistence in college, and the like, are low and of relatively 
little statistical significance. Most writers attribute the low correlations 
largely to the unreliability of the psychological tests, to the absence of ade- 
quate and valid measures of success in other functions, or to the nature and 
scope of abilities within the population measured. 

Many investigators have calculated coefficients of correlation between 
academic success in school or college and various measures of intellectual 
status. Pintner (333) listed many tables of such correlations, both for high- 
school and college students, reported by various authors. Most of these 
range between .40 and .60, but the number falling below .40 is greater than 
the number falling above .60. Many studies of this type were doubtless 
launched because of the desire of administrative officials to find reliable 
measures which might be used in connection with entrance requirements or 
as a guide in connection with the subsequent elimination of students who 
do not prove able to get along. Thus, Crawford (314: 125) said of college 
students: 


Reliable estimates of individual students’ fitness for college work are obviously impor- 
tant in determining whether or not they should be admitted. The whole array of school 
credits, entrance examinations, psychological and other tests, evidences of good char- 
acter, personal references, ratings, etc., is concerned with two questions: (1) Can this 
candidate probably maintain a satisfactory record in his college studies? (2) If so, is 
he, in other respects also, the sort of individual whom we want in our student body? 


This view doabtless underlies much of the investigation that has been 
carried on with respect to high-school students as well. But in their case, 
there is less significance in the question of initial estimate before admission, 
for it is coming to be generally conceded that the high school should be 
available for all who choose to come. Therefore, at the high-school level, 
psychological tests are of value, for the most part, in guidance after admis- 
sion. The slight attention which has been given to the prediction of high- 
school success is doubtless due to the more universal character of the high 
school, its definite place in the public-school system, and the consequent 
greater attempt to develop educational activities well adapted to whatever 
quality of mentality is found therein. Admission to high school seems well 
enough restricted, so far as mentality is concerned, by the necessity of pass- 
ing elementary or junior high-school examinations as a pre-requisite to 
promotion. This democratic attitude toward educational opportunity has not 
become so general among those concerned with college problems. Hence, 
much thought is still given to the possibility of using psychological tests 
as barriers for the elimination of the undesirable. 
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A number of attempts to predict high-school achievement have been made. 
Ross (339, 340) studied the possibility of predicting success in high-school 
subjects on the basis of elementary-school records, but intelligence tests 
were not used as part of the data. Proctor (336) used psychological test 
results, as well as other evidences of ability, in the guidance of students in 
their choice of studies. Todd (343) also tried to discover whether a certain 
group test of mental ability had value for prediction of success in particular 
high-school subjects. She found that the correlation between the entire 
battery and various subjects ranged from .16 for commercial subjects to .33 
for English and mathematics, and that individual tests of the battery were 
in no case of greater predictive value than the entire test. Tozer (345) 
analyzed results of tests of high-school students in grades nine to twelve and 
calculated regression formulae. Ratings on intelligence tests and on certain 
study habits seemed to have greatest predictive value when combined. By 
means of the formula he was able to predict the marks in 77.72 percent of 
the cases for ninth- and tenth-grade students and in 83.26 percent of the 
cases for eleventh- and twelfth-grade students. He concluded that accurate 
intelligence ratings would be of great aid to a counsellor in high-school 
guidance work. 

These studies bring out the point that psychological test results alone 
are not an adequate basis for prediction, even in connection with activities 


which might be thought most closely related to those mental functions meas- 
ured by the existing tests. 


Prediction of Academic Success in College 


Many studies of prediction have had to do with college students. The 
investigators have used high-school marks, results of intelligence tests, 
entrance examinations, and early evidence of success in the college itself, in 
devising predictive formulae. Some of these studies have been very detailed 
and exhaustive; for example, those of Odell (332) and Edgerton and Toops 
(318). Odell’s study was based upon the records of high-school students in 
Illinois and their continuation in various colleges in the state. Simple pre- 
diction of college marks from high-school marks and intelligence test scores 
gave correlation coefficients of .20 to .50. Prediction of college marks in 
the upper three years based upon various combinations of high-school and 
freshman data gave correlations only slightly higher. Odell concluded that 
there is a definite relation between these factors which at least enables one 
to raise the selection of college students above the guessing point, perhaps 
to the degree that the guess element can be reduced to about one-half. 
Edgerton and Toops (318) made a similar study of university students and 
found low correlations between percentile ranks in intelligence test scores 
and persistence in college. 





Other studies have been briefer and have given results similar to the more 
detailed and exhaustive ones. Gerberich and Stoddard (322) were able to 
obtain a correlation of .50 to .54 between their test battery and first semester 
college marks of a group of selected students. The group was composed of 
students who ranked in the upper 10 percent in intelligence among the Iowa 
high-school seniors tested in their survey. Alderman (307) concluded from 
examination of the records of failing university freshmen that there was a 
direct and significant relation between intelligence and subsequent record, 
but he did not attempt to correlate the measures. Cleeton (312) found that 
the Thorndike Intelligence and Iowa Content Examinations were of about 
equal value in predicting success in a college pre-engineering course, and 
that prediction could be raised to a coefficient of about .60 to .65 by com- 
bining the two. Crawford (314) found that scholastic aptitude tests alone 
correlated about .40 for Plan A and .48 for Plan B students at Yale. When 
the College Entrance Board Examinations were combined with school rec- 
ords, scholastic aptitude tests, and age of entrance, a formula of prediction 
was devised which gave a coefficient of .7358 for Plan A and .68 for Plan B 
students. Potthoff (335) pointed out the difficulty of finding bases that are 
adequate for prediction in all cases, but he held, on the basis of correlations 
obtained between two-year course records and intelligence, and between 
high-school average marks and first-year college marks, that retention and 
dismissal at the end of the first quarter might be placed on a much more 
adequate basis. Dilley (316) found that psychological tests were fairly 
accurate in identifying a group of students who will not be found in the 
higher scholarship ranks in college, but that many in the lower ranks on 
intelligence tests do make satisfactory records in college. 

Davidson and MacPhail (315) found correlations of .50 to .55 between 
psychological test scores and college marks. Freshmen in the lowest tenth, 
according to test scores, received seven or eight times as many failing course 
marks as those in the highest tenth. Students in the lowest tenth had about 
an even chance of remaining more than one year and about one chance in 
three of remaining till the end of their senior year. Two-fifths of the students 
who were refused registration during or at the end of the freshman year 
because of poor scholarship came from the lowest tenth. Rank in prepara- 
tory school senior class, weighted according to size of school, and test scores 
from the preparatory school combined with individual test scores, correlated 
.70 with freshman marks. Freeman (321) said that the chances of a stu- 
dent’s remaining until graduation range from 48.7 per hundred for students 
in the lowest decile of the psychological examination to 87.8 per hundred 
for those in the highest decile. He concluded that mental tests can be used 
only as supplementary information in attempting to predict academic 
survival. 

Gray (323) found young college students who were admitted under the 
age of sixteen to be definitely superior in intelligence to their own and other 
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college bodies generally and, as a group, to get along better than a control 
group made up of a random selection of college students who were on the 
average two years older. Kornhauser (326) made studies of student records 
extending over several years, but all correlations between intelligence, high- 
school record, or other measures and college success were low, ranging from 
45 to .60. Like Dilley, he found that psychological tests were chiefly valu- 
able as predictive devices within the low ranges of the abilities measured. 
Toll (344) reported results of several different tests used at Amherst, but 
found no test that picked out a lowest 4 or 5 percent for whom one could 
safely predict college failure, either for the freshman year or the entire 
course. 

Many other similar studies of college and high-school students have been 
reported, such as those of Brigham (308), Condit (313), Maddocks (329), 
Nelson and Denny (330), and O’Brien (331). All are of substantially the 
same import. Psychological tests are valuable aids in prediction, but they 
must be combined with other measures to obtain any degree of reliability. 
Even then such measures can be used safely only for the extreme cases. 
Frasier and Heilman (320) are perhaps more enthusiastic in their view of 


the predictive value of psychological tests than most other writers. They 
commend them heartily. 


Prediction of Teaching Success 


Psychological tests have been used frequently in attempting to predict 
the success of normal-school and teachers-college students in practice teach- 
ing and in subsequent field service. Practically all of these studies have 
arrived at the conclusion that such tests are of little significance. Broom 
(309) studied the records of student teachers in a California teachers col- 
lege. He found that the highest correlation was between practice teaching 
marks and total equivalent scores on the Thorndike Intelligence Examina- 
tion, although the correlation was only .296. He suggested, as Waddell had 
done earlier, that training institutions desiring to improve the quality of 
their product might do well to eliminate the lowest 5 percent of their appli- 
cants on the basis of a reasonable intelligence test. Cahoon (311) reviewed 
a considerable number of earlier studies in this field showing the slight value 
of psychological tests in prediction of teaching success and analyzed in 
another study some data from the University of California. He summarized 
his studies as follows (310: 227): 


As far as the possibility of using intelligence test scores as a determinant in predicting 
success in practice teaching is concerned, there seems to be no indication from the data 
presented in this study that the intelligence test scores of the student teacher group are 
related to the degree of their possible success as practice teachers. 


Frasier (319) studied the intelligence test records and teaching success of 
the highest and lowest 5 percent of two groups of students in a state normal 


305 





school. He found a correlation of -.028 between Alpha scores and student 
teaching. These results are even less favorable than those reported in most 
other studies. Frasier attributed them to the greater reliability of the tests 
used in other studies. He developed one point that seems to have escaped the 
attention of many other authors, namely, the fact that normal-school stu- 
dents are already a select group with respect to intelligence. With enough 
intelligence to graduate from high school, further intelligence seems to have 
little significance in ultimate success in teaching. He reviewed the older 
studies of Whitney a { Waddell: Krieger (327) concluded that the general 
psychological examination can be used constructively in the guidance of 
those students who score very high or very low, but that the test scores have 
little significance in the guidance of those of mediocre ability. 

Pyle (337: 261) recognized clearly the lack of relationship between psy- 
chological tests scores and teaching success. He said: 


It is clear that the correlation of teaching success with intelligence scores is practi- 
cally zero. But more remarkable still, the correlation between the grade received in the 
third course in practice teaching and later success as estimated by principals is only 
-146. The interpretation of this correlation is that success in practice teaching is of 
only slight value in predicting later success as measured by principals. 

While intelligence tests enable us to predict with some success the academic records 
of students, they do not enable us to predict success in practice teaching nor later 
success in actual teaching. 


Sorenson (341) made a detailed analysis of the problem. He reviewed 


several of the studies which stress the lack of relationship between intelli- 
gence and teaching success and pointed out that there are two important 
reasons for this lack. One is the intellectual homogeneity of the group and 
the other is the unknown reliability and validity of estimated teaching 
success. 


Conclusion 


All attempts at prediction are, in part, dependent upon a criterion which 
is assumed to be non-variable whether that criterion be the demands of a 
high-school or college course or the teaching profession. If these demands 
always remained constant, better progress might be made in the discovery 
of combinations of variable factors which would be sufficiently accurate for 
practical predictive purposes. But such factors do not remain constant. This, 
coupled with the unreliability of the measures of the factors which are 
known to vary, makes prediction precarious in the extreme. Of greater sig- 
nificance still, educationally, is the implied assumption underlying all such 
attempts at prediction that the criterion represents the ideal. Greater returns 
to society may possibly be had by changing the nature of the demands to 
suit the abilities of the individuals concerned, particularly in such matters 
as high-school or college attendance. Predictive devices are of limited value 
in estimating whether individual students are likely to be benefited by edu- 
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cational programs as they are now organized and managed, but they are 
even less valuable in determining whether educational programs ought to 
be’ organized and managed otherwise. 

In summary, it may be said that psychological tests have great adminis- 
trative usefulness, but that they are not a reliable sole basis for any ad- 
ministrative decision. Combined with measures of other traits and abilities, 
and with more or less subjective judgment of some of them, they are of value 
for the sectioning of classes into more or less homogeneous groups, for the 
selection of students for recommendation to colleges and universities, and in 
the prediction of academic success at higher levels. 











CHAPTER VI 
The Uses of Psychological Tests for Atypical Group 


Tue PSYCHOLOGICAL testing movement had its beginnings largely in con- 
nection with the problems of mental deficiency and abnormality. As instru- 
ments were perfected and as interest developed, tests were applied to atypical 
groups such as the criminal, the delinquent, and the dependent. Following 
their use with these special groups, tests were applied in large numbers to 
normal populations such as characterize the usual public-school situation. 
Their most recent contribution has been the discovery of children of 
superior intelligence, so far as their psychological characteristics are con- 
cerned. Simultaneously, attempts have been made to measure the intelli- 
gence of the blind and the deaf. Progress has been somewhat slower here 
because of the lack of suitable tests and because of the less universal interest 
in such problems. Naturally the greatest advances in psychological testing, 
both in point of technic and amount of work done, have occurred in connec- 
tion with the normal group, but the measuring of the extreme deviates has 
done much to round out and clarify the concept of the nature of intelligence, 
the extent and character of individual differences, and the distribution of 
mental power throughout the entire population. 


The Feeble-Minded 


The use of psychological tests has brought the concept of feeble-minded- 
ness definitely into the field of psychology and education. Before the advent 
of the Binet Scale, definitions of feeble-mindedness were phrased in legal, 
medical, or sociological terms, having but little significance for those con- 
cerned with education. Seguin became interested very early in the possibili- 
ties of education for this group. He approached the problem almost entirely 
from a physiological point of view. His work had considerable influence in 
the improvement of conditions surrounding these persons, but little progress 
was made in educational procedures until Binet laid the psychological foun- 
dations for such work. The early work of Binet and Norsworthy and the 
subsequent investigations of Hollingworth, Burt, and many others seem to 
indicate that the deficiencies of the feeble-minded are deficiencies of degree 
rather than of kind. They are subject to the same laws of learning and 
intellectual development as are other persons. 

Medical workers still place considerable emphasis on classification ac- 
cording to clinical types, but the psychological types of idiot, imbecile, and 
moron are not definitely delimited. There is more or less difference of opin- 


ion as to the upper limit of intelligence which shall be taken as differentiat- 
ing the feeble-minded from the normal. 
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Since there is little agreement on the level which separates the feeble- 
minded from the normal, estimates vary as to the number of feeble-minded 
persons in the total population. Pintner’s (392) summary of various esti- 
mates is given in Table 1. 


TABLE 1.—EsTIMATES OF THE PERCENTAGE OF FEEBLE-MINDED 











Authority Percent 
NN ESSEPOS9C. EERE OEE IS By PPE Oe PORTER ES OT .40 
i i da nc cakan ab ny ses she ee eehenenivesescce | .50 
United States, Bailey and Haber, 1920........................00005. 65 
kl, oi) a Liesdcuaieaes vee eee .70 
British Mental Deficiency Commission, Report, 1929................... .73 
Oneida County, New York, Carlisle, 1918........................... 73 
Porter County, Indiana, Clark, 1916..................scsscccceeeee Vs va 
I I ad is sah sibs clea eae oan 1.80 
Rural Survey, Delaware, Mullan, 1916.......................-0-0005- 1.80 
EE Re ee oe eee oe 2.00 
IS ds ODS se as ee cate newin’s We tie ol 2.00 
i Ee i ele kd kha aie nee inne 2.00 
Coser erway, mamcnen, 1096... wc cece eee noes a 3.00 
Popenoe’s Estimate for U.S. A., 1929. ........... 0.6.2 cece cece eee ees 4.00 
ES EE EE ee 4.% 
Eight Minnesota Towns, Kuhlmann, 1928.................-------++: 4.70 
X. County, Minnesota, Anderson, 1922.......... 2.0.2... cece cece eee 6.10 








Researches and reports bearing upon the problem of feeble-mindedness 
are reported annually by Pintner (393). The majority of more recent studies 
seems to bear upon the incidence of mental deficiency, although there is some 
attention to matters of clinical interest, such as the rate of mental develop- 
ment and constancy of mental characteristics. 

Pressey and Pressey (397) discussed the relation between I. Q. and 
diagnosis of feeble-mindedness. It is their view that an I. Q. of 70 or below 
is a strong indication of feeble-mindedness. Minogue (387) found that abaqut 
72 percent of the feeble-minded remain relatively constant on retests, but of 
those who vary, a larger number lose than gain. Wallin (405) found that 
the feeble-minded scatter less on Binet Tests than do normal children, and 
that scattering among unstable psychopaths is not enough to constitute a 
reliable diagnostic sign. According to Fox (363) the feeble-minded tend to 
make the best showing on the same tests which are done most successfully 
by normal children. Differentiation between feeble-minded and normal 
children is more nearly shown by tasks which are more definitely of a mental 
character, according to Wilson (409). Bennett (350) compared pupils in 
special classes with children in the regular elementary schools matched for 
age and I. Q. She found slight differences in educational status. Physical 
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defects and abnormalities were somewhat more numerous among the special 
class group. 

Lewis (366), according to Part IV of the Report of the British Mental 
Deficiency Committee, found that .85 percent of the children in six areas, 
tested with group and Binet tests, were feeble-minded or had I. Q.’s below 
60. The estimated incidence of feeble-mindedness for the entire population 
is .73 percent, which is about double the percentage reported by the Com- 
mission in 1906. Popenoe (396) analyzed various surveys of schools and 
the like and estimated that 4 percent of the population fell below an I. Q. 
of 70. Wallin’s (405) survey of special education in Baltimore showed that 
24.3 percent of the white and 73.4 percent of the colored children had 
I. Q.’s below 68. Town (402) reported that 24 percent of 695 behavior prob- 
lem cases and 20 percent of 75 unmarried mothers were feeble-minded. 
Bridgman’s (351) report on 3,675 cases brought to a clinic showed 1.5 
percent male and 1.2 percent female idiots and 5.7 percent male and 6.9 
percent female imbeciles. There was a much larger number of female than 
male morons. Lincoln (379) found that the I. Q.’s of problem children ex- 
amined at a clinic ranged from 20 to 110 with a median of 75. Willhite and 
others (407) found that the incidence of feeble-mindedness in South Dakota 
is about .5 percent of the total population. Only about one-seventh of that 
number are in institutions. Of some eugenic interest is the report of Martz 
(384) who found that ten out of twenty-five children born of low-grade 
mothers are of normal intelligence. Town and Hill (403) investigated the 
records of persons returned to the community from a state institution for 
the feeble-minded. These persons were supposed to be fitted for life outside 
the institution. About 14 percent were total economic failures. 


Use of Tests in the Identification of the Subnormal 


Psychological tests were first used educationally in the selection of pupils 
who were mentally deficient and who were to be assigned to special or 
auxiliary classes. This use preceded their use for purposes of general classi- 
fication or for the selection of gifted pupils, doubtless due to the fact that 
extreme variations in individual differences were first noted and those varia- 
tions at the lower end of the scale were most conspicuous. Bright pupils 
might easily go unobserved in a school régime gauged to the abilities of the 
mediocre, but dull pupils could not keep up, regardless of pressure. They 
naturally received the bulk of attention in the early application of psycho- 
logical tests to school pupils. Pintner (392) traced the early development 
of school testing directed toward the identification of dull and subnormal 
pupils by means of individual and group tests. Hilleboe (370) reviewed the 
studies bearing upon the problem of diagnosis, assignment, and educational 
treatment of this group as well as others who may be classed as atypical. 
His conclusions as to the use of psychological tests in the identification of 
the mentally subnormal were as follows: 
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(1) Group tests are an economical and accurate method of discovering the mentally 
subnormal. 

(2) Boys and girls with I. Q.’s of 85 or below, those indicated as subnormal by 
teachers, those who show marked irregularity in the group tests, and those who are 
retarded in school progress, may well be selected for further diagnosis. 

(3) Individual intelligence tests should be given to all boys and girls discovered by 
the initial selective process. 

(4) Verbal tests should be supplemented by motor tests for the linguistically handi- 
capped, and by tests of special abilities for all those whose record indicates the de- 
sirability of such tests. 

(5) The criterion of three years or more of mental retardation as a requisite for 
admission to special classes is unsatisfactory from the standpoint of delay in selection. 


He further pointed out that test results .aust be combined with many 
other items in arriving at a true diagnosis of mental subnormality; and that, 
so far as school history is concerned, pupils may appear to be subnormal 
who are suffering from various physical handicaps. Broady (352) also 
reviewed many other studies in his formulation of administrative policies. 
His conclusions with respect to the use of tests for purposes of diagnosis and 
classification were not at variance with those of Hilleboe. The older books 
and studies which have had to do with classification and diagnosis of this 
group, such as those of Goddard, Wallin, Anderson, Porteus, Hollingworth, 
and Doll, were reviewed by both Pintner and Hilleboe in the references 
cited. Bennett (350) studied subnormal children in the regular elementary 
schools in comparison with those who had been assigned to special classes. 
Her purpose was primarily that of appraisal of educational status rather 
than the evaluation of psychological tests in selection. She found that there 
were no marked differences in any of the psychological characteristics of 
the two groups, but this seems to reflect upon the basis used in the initial 
recommendation for psychological examination rather than upon the valid- 
ity of any of the devices used in diagnosis. In a city which does not have 
facilities for testing all children, it seems obvious that the more noticeable 
characteristics, such as physical stigmata and disciplinary troubles, would 
be seized upon as an indication of mental deficiency where the only other 
criterion was school retardation. 


Use of Tests in the Identification of the Superior 


Due to the concentration of attention on the measurement of the sub- 
normal and abnormal deviate, early psychological tests were not well 
adapted to the discovery of superior intelligence. Measurement of persons 
with superior intelligence had to await the development of better scales. 
Partly for this reason and partly because of the fact that superior children 
are less conspicuous in an ordinary school situation, experimental work in 
measurement as well as in education has been less extensive and voluminous. 
Nevertheless, as Pintner (392: 350) indicated in his recent book: 
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. + the child of superior intelligence has been discovered by the intelligence test. 
Previous to this time we have had geniuses, peculiar freaks, and extraordinary prodigies, 
and the connotation attaching to such words as “genius,” “prodigy,” “precocity,” and 
the like, indicates that they were regarded as something apart, as something unhealthy 
and slightly abnormal. We have had to wait for the intelligence test to give us a better 
definition of superior intelligence and to show us that superior intelligence is not near!y 
so uncommon as we seem to have imagined. 


Among early investigations in this field, the work of Whipple was prob- 
ably most noteworthy. The monumental work of Terman in his Genetic 
Studies of Genius and the experimental work of Hollingworth were out- 
standing in the discovery that superior children are not necessarily physical 
weaklings, or persons of peculiar temperamental traits and anti-social 
habits. 

Terman’s (401) study of one thousand gifted children is well known. 
Inasmuch as it was the first investigation of large scope, the principal find- 
ings are worthy of reiteration. All the children had I. Q.’s above 140. The 
group of one thousand furnished a larger ratio of boys to girls than would 
be expected in the general population. The social status of the families of 
these children was much higher than the average, although there were in- 
stances of children coming from very poor families. There was a much 
greater proportion of distinguished relatives than would be found in the 
average family. These children were physically superior to the normal con- 
trol group and were healthier than the normal racial stock. There were fewer 
cases of insanity and feeble-mindedness among relatives. As to school status, 
they were 14 percent of their age above the norm in grade location and 48 
percent of their age above normal in intelligence. They showed no more 
unevenness in abilities than do normal children. They were interested in 
much the same sort of things, but their play interests tended to place more 
emphasis on activities involving thinking. They were somewhat more mature, 
quieter, and less sociable. Eighty-five percent of the group were above the 
median of normal children in character and personal traits. 

Much of the material developed in recent years is summarized in Holling- 
worth’s (371) book. In addition, Hollingworth presented data relating to 
several children whose I. Q.’s were above 180, and discussed the educational 
implications of their mental and physical characteristics. Hollingworth and 
Monahan (374) showed that children of superior intelligence were superior 
to normal children in certain motor tests and not especially lacking in any 
motor abilities. Similar results were obtained in their investigation of su- 
perior and normal children in jumping, chinning, and strength of grip 
(388). Superior children were also about as sensitive on the Seashore Musi- 
cal Ability Tests as normal children of similar chronological age (372). 

Cox (357) attempted to measure the mental status of historical characters. 
She collected the boyhood records of three hundred eminent men of history, 
born between 1450 and 1850, and had experts estimate their intelligence 
quotients. The estimates ranged from 100 to 200, with a median of 135. 
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Goddard (364) described the education of a group of superior children in 
Cleveland. The I. Q.’s of this group ranged from 108 to 172, with a median 
of 128. Of 244 children in this group, 127 were boys and 117 girls. Daniel- 
son (358) similarly described a special class for superior children in Los 
Angeles, in which the average I. Q. was 134 and about one-fourth of the 
class were above 140. Kiefer (376) found no difference between the means 
of superior and normal groups on five motor tests for ages nine, ten, and 
eleven, and no correlation between these tests and intelligence. 

Considerable attention has been given to the subsequent careers of su- 
perior children who were the subjects of earlier studies. Hollingworth 
(373) reported on the mental status of an individual who ten years before 
had an I. Q. of 187. His status at the time of the second report was about 
four P. E.’s above the median level of college students on the CAVD Tests. 
Lincoln (378) found that a group of underage children admitted to school 
on the basis of mental age were above the median in achievement in grades 
four to seven. Witty (410) studied one hundred children with I. Q.’s above 
140. After five years they showed the same superiority in physical character- 
istics over a control group as when first tested, but their I. Q.’s measured 
by the Terman Test were lower than the original Binet I. Q.’s. Lamson (377) 
investigated the high-school records of the group studied earlier by Holling- 
worth. They were significantly superior to students generally in achievement 
and not inferior in health. Their intelligence quotients were all above the 
highest decile for high-school students and more than half were at the top 
percentile for adults generally. Gray’s study (365), mentioned in another 
section of this Review, showed that although young college students of 
superior mental powers did not greatly excel other students in scholastic 
marks, they graduated in less time and took part in a larger number of 
activities. 

Duff (362) studied the careers of a group of children with I. Q.’s above 
135. He compared them with children of normal abilities. Seven of the 
normal group who had subsequently entered secondary schools were found 
to be inferior in achievements to thirteen of the superior group who had not 
spent any time in secondary schools. Individuals in the control or normal 
group were inferior in practically all respects. Even in those points in which 
further education might be expected to bring any individual above the 
achievements of those who had stopped school earlier, no such result was 
achieved in this case. Duff concluded “that higher education cannot com- 
pare with innate intelligence as a differentiating force.” 

The most comprehensive follow-up study is that reported by Burks and 
others (354) in the third volume of Genetic Studies of Genius. After six 
years they found that the mean I. Q. of the younger children had decreased 
from a mean of 148 when first studied to 139 at the time of the second study. 
Most of the decreases were found among the girls. The older children were 
found to be in the 97th to 99th percentiles on the Terman Group Test. The 
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majority of all the cases were at approximately the same mental level as 
when first tested. Those who were students in Stanford University and those 
who were not had a mean score on the Thorndike Test above the university 
mean. 

The use of psychological tests in the identification of the superior js 
subject to many of the same limitations which hold when they are used in 
the identification of the subnormal. In this case, however, psychological 
tests seem to be of somewhat more crucial importance because of the absence 
of many of the social and scholastic factors which are effective, at least in 
part, in establishing mental subnormality. There is often little incentive 
growing out of the régime of the school itself for those of superior mental 
power to make greater use of their abilities than is required to meet the 
demands set for the mediocre. It is only by the use of some diagnostic device, 
apart from the ordinary devices of the school room situation, that mental 
superiority can be identified with any degree of certainty or objectivity. 

The situation with respect to the superior child is well expressed in the 
following quotation from Pintner (392:451). 


The superior child has never been considered a problem in the schools, mainly 
because he has never really been recognized. He almost always can cover the required 
work, and, so doing, fulfills the main requirement of the school. If he is unruly or 
mischievous, the school can and does deal more or less effectively with this type of 
behavior, even though it does not recognize that it may sometimes be a symptom of 
superior intelligence. Again, the school greatly resents the suggestion that it cannot 
recognize superior intelligence, that it is necessary to have psychological tests in order 
to discover it. Does it not daily and monthly pile up a vast array of grades and marks, 
so that the sheep may be separated from the goats, so that it may reward the brightest 
scholars and admonish the laggards? In other words, the school has tacitly assumed that 
the amount of school work accomplished is a direct measure of general intelligence, 
and is only slowly beginning to realize the difference between educational attainment 
and general intelligence. Even today this distinction between knowledge and intelligence 
is not clear in the consciousness of the teacher. She is apt to assume that because a 
child has done good work in the class in which he happens to be, he is, therefore, of 
superior intelligence. And, conversely, if he does merely average or poor work, he is, 
therefore, of normal or subnormal intelligence. 


In the opinion of Hilleboe (370) the application of one or more verbal 
intelligence tests is no more satisfactory in appraising all of the abilities 
which should be taken into account in developing a well-rounded program 
for the superior child, than for the dull. Since it is probable, however, that 
the emphasis in school work will be placed—and quite naturally—on those 
phases of education which depend upon higher powers of mental function- 
ing, psychological tests are of paramount importance in the selection of 
gifted children for special classes or special attention in regular classes. 


The Delinquent and Dependent 


Early psychological testing of the delinquent was carried on largely in 
institutions; but as social workers and public agencies have concentrated 
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more and more on the prevention of delinquency, the use of tests in con- 
nection with courts, welfare centers, and guidance clinics has become more 
general. According to Pintner’s (392: 392) statement, although earlier 
studies tended to show a large amount of feeble-mindedness among delin- 
quents, later studies have established the fact that delinquency cannot be 
explained as due to any one cause. 

Tests applied in detention homes, juvenile reformatories, and at behavior 
clinics furnish most of the data regarding the mental status of delinquents. 
In a study of reformatory boys, Adler (347) found that 31 percent were 
feeble-minded and 22 percent were of borderline mentality. Only 4.6 per- 
cent had I. Q.’s above 110. In another report (346) he analyzed the results 
of tests in a state school for boys and showed that Chicago boys had an 
average I. Q. of 82, whereas boys from other parts of the state of Illinois had 
an average I. Q. of 76. In a juvenile detention home, 35 percent of the cases 
were feeble-minded, with I. Q.’s below 70. Asher (348) found the median 
I. Q. of twenty reform school boys to be 67. However, these boys did about 
as well as the average on the Stenquist Assembly Tests. McCaulley (380) 
tested one hundred boys in a disciplinary school and found an I. Q. range 
of 57 to 117, with a median of 85. Sixteen percent were feeble-minded and 
26 percent were borderline cases. Kuhlmann (386) reported that 24 to 42 
percent of the inmates of five Minnesota institutions had I. Q.’s below 75. 
In the Whittier State School, Sullivan (400) found that entering boys had 
an average I. Q. of 90. She attributed this high average to the fact that 
definitely feeble-minded delinquent boys are sent to other institutions. 

Derby (360), who tested girls at the Women’s Protective Association in 
Cleveland, found less than 1 percent to be of superior intelligence. Normal 
expectancy would allow a much greater percent. The studies of McClure and 
Goldberg (381) and Caldwell (355) confirmed earlier reports as to the low 
intelligence of unmarried mothers and of inmates of industrial schools. 

Merrill (385) found an average I. Q. of 82 for juvenile delinquents. 
According to Maris (383) , 8 percent of the juvenile delinquents of Manitoba 
were feeble-minded and 21 percent were borderline cases. Riley (399) 
found that the mental age of probation boys was higher by about one year 
according to performance tests than when measured by the Binet Test. Cole- 
man (356) found no difference in intelligence between problem boys and 
non-problem boys in high school. Riddle (398) studied clinic records to 
discover the relation between intelligence and stealing. The mean I. Q. of 
those known to steal was 78. Those who did not steal had a mean I. Q. of 70. 
Those who could not be classified definitely with respect to this type of 
offense had a mean I. Q. of 66. Delinquency of this type seems to increase 
with M. A. and I. Q. 

In a study of adult prisoners, Adler (346) found that the distribution of 
I. Q.’s was rather closely comparable to the army distributions based on 
Army Alpha. But Murchison (389), who tested four thousand white prison- 
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ers, found them somewhat better on Army Alpha than the white draft. Jn 
a study of the illinois parole system, Burgess (375) found that the most 
intelligent prisoners violated parole as much as, if not more than, the less 
intelligent prisoners. 

Where delinquency is complicated by defective mentality or abnormal 
mental personality traits, the difficulty of achieving a satisfactory social 
adjustment is greatly increased according to Healy and Bronner (368). 
About 13.5 percent of the juvenile delinquents studied were feeble-minded. 
According to Healy (369), 85 percent of delinquents of normal mentality 
were successful in foster homes, but only 40 percent of delinquents of 
defective mentality were successful. 

Dependency is closely related to delinquency and a common factor of 
defective mentality seems to characterize the two groups. Psychological 
tests of dependent children reported in the fourteen different studies re- 
viewed by Pintner (392:400) showed amounts of feeble-mindedness rang- 
ing from 5.7 to 39. The same reports showed large amounts of backward 
mentality among dependents, ranging as high as 62 percent. Reports of 
mentality above normal were noticeably few in number. Psychological 
studies of adult dependents showed equally large amounts of inferior 
mentality. Davis (359) compared orphanage children in Texas with public- 
school children using the Dearborn Test and the Haggerty Test. According 
to scores on the former, there was over three times as great a percentage 
of orphanage children with I. Q.’s below 70 as of public-school children, 
and only one-seventh as great a percentage with I. Q.’s above 120. Results 
on the Haggerty Tests were somewhat more favorable to the orphans. 


The Deaf 


Outstanding work in the psychological testing of the deaf has been done 
by Pintner and Paterson who first attempted to use a recognized intelligence 
test. Table 2 shows the I. Q.’s of deaf children as determined by Day, Fus- 
field, and Pintner (392:411). 


TABLE 2.—THE INTELLIGENCE OF DEAF CHILDREN OF VaRIOUS AGEs * 








Deaf Hearing equivalent I. Q. of deaf 
Age 12 Age 10 83 
Age 18 Age 10-6 81 
Age 14 Age 11-0 79 
Age 15 Age 12-0 80 











* This table is to be read as follows: The average mental age of deaf children twelve 


yom age is equivalent to that of hearing children ten years old, giving them an average 
. Q. of 83. 
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Drever and Collins (361) devised a performance test with which they 
tested deaf children and hearing children. They found no differences be- 
tween the two groups. Drever used his performance scale in testing 1,474 
deaf children between five and sixteen years of age and found the deaf chil- 
dren slightly above the median for hearing children. Pintner (395) found 
no correlation between measures of speech or speech reading and the Pint- 
ner Non-Language Test, but fair correlation between speech reading and an 
educational test. Tests of deaf pupils with the Pintner Non-Language Test 
and the Arthur Performance Scale, made by Brown (353), gave a correla- 
tion of .80 which was reduced to .61 when chronological age was held con- 
stant. Williams (408) gave the Goodenough Drawing Test to a group of 
deaf pupils. The mean I. Q. for the group was 79.5, with about one-fourth 
of the subjects below 70. Pintner (394) tested four thousand deaf children 
with the Pintner Non-Language Test. His results showed a very marked re- 
tardation both of intelligence and achievement on the part of deaf children. 

Upshall (404) studied 311 matched pairs of deaf children in day schools 
and institutions. He found the brighter children in the day schools. He first 
attributed the difference to the fact that day-school pupils have more resid- 
ual hearing and become deaf later; but even when these factors were 
equated, the day pupils were superior in achievement. 

Madden (382) compared hard-of-hearing children in regular public 
schools with hearing children of the same age, sex, race, and parental occu- 
pation. He found a difference of 6.42 in I. Q. in favor of the hearing chil- 
dren. From an analysis of individual test items of the Binet Test he con- 
cluded that hard-of-hearing children do not have verbal handicaps that are 
not also characteristic of hearing children. A correlation of —.123 between 
intelligence and auditory loss was obtained after eliminating the factor 
of age. 


The Blind 


Pioneer work in psychological testing of the blind was done in Cleveland 
by Irwin (392:426). A comparative study made by Hayes of 670 blind chil- 
dren and 1,000 unselected sighted children tested by Terman gave the dis- 
tribution of intelligence shown in Table 3 (392:427). 

More recent studies have not modified these findings appreciably. Hayes 
(367) reported that on tests suitable for both seeing and blind, the blind 
were ten points in I. Q. below the seeing. Myers (390) made a survey of 
sight-saving classes and found that 58.9 percent of the cases tested had I. Q.’s 
lower than 90 and that only 9.4 percent ranked higher than 110. Pintner 


(392:430) summarized the work that has been done with the blind as 
follows: 
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A very good beginning has been made in the intelligence testing of the blind. Enough 
research work has been done on the construction of scales so that useful methods of 
measurement exist. The results of the tests so far published show the blind as a group 
somewhat inferior mentally to the sighted. There is evidently a large percentage of 
feeble-mindedness among the blind. . . . The difficulty of acquiring language is no: 
nearly so great among the blind as among the deaf, because the former learn to speak 
and talk easily and normally, just as a normal hearing-seeing child does. Lastly, there 
is no evidence that compensation for the handicap of blindness exists in the form o{ 
increased sensitivity of touch, hearing, or of a better memory ability. 


TABLE 3.—PERCENTAGES OF BLIND AND SIGHTED CHILDREN AT VARYING 
LEVELS OF INTELLIGENCE 




















Percentage found among 
Classification a 
The blind The sighted 
ae ie shint cs hn hee aes ea ok Se Ee 0.3 0.5 
Ee ona seins faction vwaineee 1. _ 
ATT oe a 5. 9. 
ES Ae, ey err te 68. 76. 
TE ¢ Oak s in ce h'cebinn Canes ohenyes 12. 8. 
A. ),., BeRGda ck bw.cd + abe cael . 2. 
EE. 6862 636s cota smanedonhs 5. 0.3 
Speech Defects 


Barnard (349) reviewed thirty-two studies bearing upon the relation of 
intelligence and other factors to defective speech. His conclusions based 
upon this review were as follows: 


(1) Low intelligence frequently accompanies delayed speech and language difficul- 
ties. In some cases, low intelligence accompanies stuttering. . . . 

(2) A study of stutterers shows them to possess every degree of intelligence. . . 

(3) The studies show a wide range of intelligence from low to high with no uniform 
rating. This conclusion applies to persons suffering from all classes of speech defects 
and is the same as the conclusion for stutterers alone. 

(4) The use of intelligence tests may be helpful in indicating symptoms of speech 
defects but not in locating their causes. 


Physical Defects and Abnormalities 


Investigations of the relation between intelligence and bodily condition 
were reviewed critically by Paterson (391). He devoted attention particu- 
larly to a review and criticism of the evidence of the effect of diseased and 
abnormal conditions, malnutrition, and the like, on mental efficiency. Sum- 
marizing the evidence available Paterson (391:211-12, 269) said: 

On the basis of available evidence the notions that dire mental consequences may 
arise from physical defects and poor physical condition have been and still are greatly 


exaggerated. Such consequences as exist are demonstrable for mankind in the mass 
only to a slight degree. 
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With the exception of diseases and injuries directly involving the central nervous 
system itself, it would appear that we cannot explain the tremendous range of individual 
differences in intellect on the hypothesis that unfavorable physical condition or specific 
physical defects are operative as a major causal factor. Apparently nature has so safe- 
guarded the central nervous system as to render normal mental development relatively 
secure or at least strongly immune to such deleterious influences as malnutrition, 
diseased tonsils, enlarged adenoids, defective breathing, defective teeth, simple goiter, 
intestinal toxemia, and even hookworm. . . . 

It appears that such structural characteristics as height and weight are correlated 
only slightly with intelligence, narrowly defined. Even measurements of head size and 
shape are found to be relatively independently variable with respect to intellect, and 
skeletal development measured by precise X-ray photography yields either zero or low 
correlations with intelligence. The same may be said of dentition. Physiological develop- 
ment, measured in terms of pubescence, is found to be relatively unrelated to mental 
development, and so are complicated morphological indexes of body build. 


Paterson also reviewed the bearing of the evidence he had assembled on 
the problem of individual differences and the nature-nurture conflict. His 
evidence and his conclusions contain little solace for the environmentalists. 
The entire study showed that Paterson agreed with Pearson, from whom he 
quoted as follows (391 :289) : 


Intelligence as distinct from knowledge stands out as a congenital character. Let us 
admit finally that the mind of man is for the most part a congenital product, and the 
factors which determine it are racial and familial; we are not dealing with a mutable 
characteristic capable of being moulded by the doctor, the teacher, the parent, or the 
home environment. These may provide the material upon which it can act, and give a 
welcome scope for its activities, but they do not create it. 


The Report of the Committee on Special Education of the White House 
Conference on Child Health and Protection (406) is the most recent and 
comprehensive statement of the needs of atypical groups. 


Summary 


On the basis of an intellectual criterion of differentiation, this chapter 
has reviewed studies dealing with the feeble-minded and with those of 
superior intelligence. Studies of feeble-mindedness have dealt largely with 
its incidence in the total population and with the amount of feeble-minded- 
ness in certain behavior groups. Estimates of feeble-mindedness in the total 
population vary from .4 percent to 6.1 percent. Interest in the superior is 
a recent development in education encouraged principally by the derivation 
and use of psychological tests. It has been demonstrated that gifted students 
are often as superior in all other desirable traits as in intelligence. Evidence 
that there is some tendency for the superior to regress toward the normal 
population at a later age is somewhat disquieting, but this apparent ten- 
dency may be due to absence of valid and reliable measures of highly 
developed personality traits or comparative achievement at later periods. 
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The groups selected for study on the basis of a non-intellectual criterion 
of differentiation include the dependent and delinquent, the blind, the deaf, 
those with speech defects, and those with physical handicaps and abnor- 
malities. The long established belief that dependency and delinquency were 
prima facie evidence of defective mentality is decidedly challenged by the 
studies reviewed. Although there is a noticeable lack of superior intelligence 
among the delinquent, this may be due to the peculiar social and legal con- 
ditions under which delinquent persons come to the attention of psycho- 
logical investigators. There is considerable evidence that the amount of 
defective mentality among the delinquent groups has been greatly over- 
estimated, and that it may not be substantially greater than is to be found 
in the population at large. The dependent as a special group, aside from 
considerations of delinquency, is rather definitely characterized by abnor- 
mally large amounts of defective mentality. 

Most studies indicate that both the blind and the deaf may be expected to 
reveal inferior intelligence, at least insofar as their behavior may be com- 
pared validly with that of sighted and hearing persons. Nevertheless, little 
has been accomplished by way of measuring the intelligence of such persons 
in terms of a behavior criterion which takes into account the sensory defects 
themselves. Speech defects are not definitely identified with particular levels 
of intelligence. 

Neither gross physical characteristics nor abnormal or pathological phys- 
ical conditions, except those directly affecting the central nervous system, 
show any significant relation to mental efficiency. 

















CHAPTER VII 


Vocational Aptitudes Tests and 
Their Applications 


VocarionaL tests are being examined both critically and uncritically to a 
greater extent than heretofore. In public intermediate and high schools, in 
colleges, in business and industrial centers, and in child guidance clinics 
questions are being raised about individual abilities, attitudes, and interests. 
Vocational tests, which are concerned with special and more or less separate 
abilities, will doubtless answer many of these questions. 


Historical Résume 


Hugo Miinsterberg (433) may be regarded as the first advocate of voca- 
tional tests as such in the United States. In Germany the movement is well 
covered in Giese’s (424) exhaustive treatise of tests and in Baumgarten’s 
(415) historical sketch. In France the work of Lahy in Paris is outstand- 
ing, but French scholars have been slow to approach the field of applied 
psychology. In England the National Institute of Industrial Psychology 
has done a great deal of work with private enterprises and with schools. 
Cox (420), Earle and Macrae (422), Myers, and Muscio also made valu- 
able contributions. With reference to work in Switzerland, the names of 
Claperede, Walther, and Ehinger should be mentioned. Work in Russia 
was described by Baumgarten (414). 

Hugo Miinsterberg was invited to become director of the Harvard 
psychological laboratory in 1892 at the instance of William James. His 
background caused him to take great interest in the life of the American 
people. His books indicate the range of his interests, including such prob- 
lems as the application of psychology to criminal law, to psychotherapy, 
to education, and to industry. In 1911-12 he investigated several of the 
largest manufacturing concerns, and sent out a circular to one thousand 
leading companies, asking what some of the mental traits required of em- 
ployees were. His purpose was to determine a principle by which any 
candidate for any industrial job might be tested at any time. He tested 
telephone girls for memory, attention, intelligence, exactitude, and rapidity. 
He tested street-car motormen and ship pilots. His two books that concern 
themselves the most with vocational tests are Psychology and Industrial 
Efficiency and V ocation and Learning. Without attempting to evaluate Miin- 
sterberg’s work we can say that it aroused much interest and suggested 
numerous possibilities. 

Link (429) devised a number of tests and tried them out under actual 
working conditions during the period of the World War. However, the 
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testing program of the United States army, which gave great impetus 
to the development of intelligence tests and trade tests, contributed rela- 
tively little toward vocational tests as such. These have been developed, 
for the most part, during the post-war period. 


Vocational Tests 


Any classification of vocational tests is somewhat arbitrary. A test may 
or may not be “vocational” depending on the purpose to which it is put. 
Thus a measure of “height,” which may be intended for any number of 
purposes, becomes a vocational test when, as one measure of individual 
differences, it is included in a battery of tests used in vocational guidance. 
For the purposes of this review the most practical grouping seems to be 
(1) mechanical tests, (2) manual tests, (3) clerical tests, and (4) mis- 
cellaneous tests, including those for special ability in art and music. The 
tests listed are merely representative of the work that is being done. It is 
hoped that individuals interested in the vocational testing movement will 
find suggestions helpful to them in a more thorough study of the field. 

Mechanical tests—The Stenquist Tests (441) of mechanical ability are 
perhaps the best known. They were devised by Dr. John Stenquist and 
applied to several hundred students in the New York public schools. The 
Stenquist Tests were used in the army (449); by Toops (444) with 145 
boys of ages twelve through fifteen; by Scudder and Raubenheimer (438) 
with 114 boys in the seventh and eighth grades of a junior high school; and 
by Commins (419) with 206 individuals (men and women) in a teacher- 
training school in which the ages ranged from seventeen to twenty-one. 
Modifications of the Stenquist Test were used by Earle and Macrae (422) 
and by Paterson, Elkott, and others (434). 

Cox (420) approached the analysis of mechanical ability from a differ- 
ent angle. He first used the Stenquist Assembly Test and gave it up because 
“the tests showed little correlation with each other, and were found to 
involve a certain degree of digital strength and skill.” He then devised a 
series of models which called for no manipulation. Certain mechanical 
movements were observed and then analyzed by the subject. The tests were 
given to two groups of untrained subjects, 88 students at a commercial 
school and 114 students in an elementary school; and to one group of 
trained subjects, 228 students in a technical school. The ages ranged from 
ten years to adulthood. The particular feature of the Cox Tests is the separa- 
tion of the factor of ingenuity from that of manipulation. 

Stine (442) used a so-called “measurement test” of his own design 
with 160 full-time students in the eighth, ninth, and tenth grades and 
compared the results with the Minnesota Revision of the Stenquist Test 
and with results from a test of general intelligence. A gain in scores with 
training was observed but no correlation figures were presented in the 
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study. He compared shop students with non-shop students and found that 
the average Minnesota-Stenquist Score for shop students was only 7.2 
points above the non-shop average. On the Stine Test the difference was 
1.8 points in favor of the shop group. The average I. Q. of the shop group 
was 2.8 points below the non-shop group. 

The 586-page volume by Paterson, Elliott, and others (434) is the most 
thorough-going study of mechanical ability yet published. Preliminary 
tests were given to 217 junior high-school boys, eleven hours being re- 
quired for all the tests. Seven tests were selected for the experiment proper 
and given to 150 boys of a “new entering class,” supposedly of the 7-B 
grade. Additional factors considered were (1) academic success; (2) pre- 
vious experience in mechanical work; (3) interests, occupational and 
academic; (4) motor ability, e. g., agility and gymnasium success; (5) 
height, weight, and vital capacity; (6) social and economic status; and 
(7) home influences. Of the seven tests three proved the most successful: 
spatial relations, paper form board, and assembly. These tests are identi- 
fied as the Minnesota Mechanical Ability Tests. 

Link (429) described the use of three tests with thirty-five men: the 
Stenquist Assembly Test, a form board of his own design, and a cube 
assembly test. The correlations with foremen’s ratings were over .80 but 
the number of cases was too small for generalization. Link’s work is 
marked by his attention to such qualitative factors as rhythm and attitude 
and by his use of actual shop situations. Many tests for vocational selection 
are described in his book. 

Keane and O’Connor (428) described a measure of mechanical aptitude, 
consisting of a rectangular prism of wood cut into nine wiggly pieces. To 
administer the test, the dissembled parts are presented to the subject for 
assembly. It was given to 868 unselected persons in a large industry. Of 
these persons 44 engineers, 114 mechanics, and 81 draftsmen scored con- 
siderably above the unselected group. Three hundred applicants for 
mechanical occupations were tested and assigned mechanical work. Within 
six months 74 percent of the D grade men, but only 31 percent of those 
who rated A, left their jobs. The tests are described in more detail by 
O’Connor in a separate volume. 

Baker and Crockett (413) devised a mechanical aptitudes group test 
that is regularly used in the Detroit schools. Considerable unpublished 
material on mechanical ability is available in the Detroit Psychological 
Clinic. 


Manual-motor tests—The early literature on manual or motor tests was 
reviewed by Whipple (448). Link (429) used a number of tests of the 
manual type, including, in particular, a form board and a revolving dial 
for serial movement. A wealth of German material on motor tests was 
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tions to a knowledge of manual skill, primarily from the standpoint of 
industrial efficiency. Hull (427) discussed in detail the theoretical con- 
siderations underlying manual or motor ability, and included references 
to McFarlane (430) and Perrin (435). MacQuarrie (431) described a test 
of mechanical ability that is essentialy the task of guiding a pencil through 
various motor performances. Paterson, Elliott, and others (434) included 
several manual-motor tests in the mechanical ability battery. In particular 
they extended the Link Form Board into a more reliable test. The excellent 
bibliography in their volume refers to a number of other manual-motor 
tests. Crockett (421) described a measure of manual ability which was 
given to over one thousand individuals in the Detroit schools and correlated 
with employee performances in several industries. 

Clerical tests—Rogers (436) examined forty-five subjects taking steno- 
graphic work at Columbia University, using the Woodworth-Wells Test 
and the Trabue Language Tests. Tuttle (445) tested twenty students in 
beginning typewriting, and found tests of “motor action,” attention and 
accuracy, and substitution the most successful. Bills (417) tested 139 sub- 
jects who were taking a night-school course in stenography and were also 
employed. She used a general intelligence test, a special aptitude test 
composed of five parts, and a will-temperament test of ten parts. Link (429) 
used a test for spelling, substitution, and sentence completion. These or 
relevant tests were given to 300 seniors in a commercial high school, 76 
pupils in two business schools, 22 office typists, 19 stenographers, over 
400 candidates for typing and stenographic positions, 140 comptometrists, 
and to more than 120 candidates for comptometry. Thurstone (443) de- 
vised tests for typists and stenographers but the test manual includes no 
data as to subjects tested. Yoakum and Bills (450) presented an excellent 
overview of tests for office occupations including (1) general ability tests, 
(2) special aptitude tests, and (3) proficiency or trade tests. 

Miscellaneous tests—The technics involved in vocational testing are de- 
scribed by Griffitts (426), Hull (427), Bingham and Freyd (418), Link 
(429), and others. Tests of music and drawing aptitude are described by 
Seashore (439), Stanton (440), Ayer (412), Fischlovitz (423), and 
Manuel (432). Two excellent bibliographies on psychology in industry 
have been prepared by Viteles (447), who is one of the best informed 
industrial psychologists in the United States. 
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